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Requirements  for  the  Degree  of  Doctor  of  Philosophy 

A VALIDATION  STUDY  OF  SEVERAL 
GERIATRIC  BEHAVIOR  RATING  SCALES 

By 

Paul  K.  Chafetz 
March,  1981 

Chairman:  Walter  R.  Cunningham,  Ph.D. 

Major  Department:  Department  of  Clinical  Psychology 

Five  scales  designed  to  measure  Activity  of  Daily  Living  skills 
in  geriatric  patients  were  studied:  the  Physical  Self-Maintenance 

Scale,  the  Rapid  Disability  Rating  Scale,  Factor  3 of  the  Geriatric 
Rating  Scale,  The  Minimal  Social  Behavior  Scale,  and  the  Performance 
Test  of  Activities  of  Daily  Living.  The  first  three  are  subject- 

absent  scales,  meaning  that  raters  complete  the  scales  based  on  their 
knowledge  of  the  subject.  The  last  two  are  subject-present  scales,  in 
that  they  are  directly  administered  to  each  subject  individually.  The 
interrater  reliability  (IRR),  construct  validity,  and  concurrent 
criterion-related  validity  of  each  was  evaluated.  Hospitalized  male 
veterans  (N=110)  on  neurology,  medicine,  and  psychiatry  wards,  and 
nonhospital ized  male  veterans  (N=30)  were  subjects. 

Dual  ratings  of  50  subjects  yielded  excellent  correlational  IRR 
results,  ranging  from  .853  to  .985,  all  significant  at  p=.0001. 
Construct  validity  of  the  scales  proved  very  strong  through  both 
correlational  and  factor  analytic  techniques.  The  two-scale  correla- 
tions ranged  from  .498  to  .893  in  absolute  value  and  were  all 
significant  at  p=.0001.  A principal  components  analysis  yielded  one 
factor  which  accounted  for  79%  of  the  total  variance,  and  the  scale 
loadings  in  this  factor  ranged  from  .942  to  .793. 
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Each  scale's  criterion-related  validity  along  each  of  the  follow- 
ing dimensions  was  evaluated:  (1)  age  among  male  neurology 

inpatients,  (2)  type  of  impairment  among  elderly  male  inpatients, 
within  broad  categories,  (3)  presence  of  neurological  pathology  among 
males,  and  (4)  inpatient  status  among  elderly  males.  For  each 
criterion,  the  appropriate  subsamples  were  grouped  and  a Hotelling's 
t-squared  analysis  and  least  significant  difference  post-hoc  multiple 
comparison  conducted.  With  just  one  scale-by-criterion  exception, 
every  scale  clearly  demonstrated  validity  along  all  four  dimensions. 
The  hit  rate  of  scales  in  discriminating  elderly  neurological  and 
psychiatric  patients  ranged  from  65.00%  to  80.85%. 

Several  methodological  difficulties  emerged  during  this  study. 
First,  subject  recruitment  procedures  were  less  systematic  and 
rigorous  in  some  samples  than  others  and  may  have  caused  slight  bias 
in  some  sample  means.  Second,  many  subjects  had  two  or  three  of  the 
subject-absent  scales  completed  by  the  same  rater,  bringing  into 
question  the  adequate  independence  of  these  ratings.  With  non- 
hospital i zed  subjects,  more  rigorous  rater  training  would  have  been 
desirable.  Third,  the  participation  rate  of  non-whites  in  the  study 
varied  widely  between  samples,  from  23.3%  to  0.0%.  However,  no  scale 
correlated  significantly  with  race  for  the  entire  sample,  and  only  one 
scale  by  race  correlation  was  significant  among  the  individual 
samples. 

The  highly  favorable  psychometric  results  with  every  scale  evalu- 
ated leave  potential  test  users  with  choices  of  both  method  (i.e., 
subject-absent  vs.  subject-present)  and  specific  instruments.  Subject- 
absent  scales  are  probably  the  technology  of  choice  (1)  whenever  many 
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indigenous  raters  are  available  and  trainable,  (2)  when  there  may  be  a 
lack  of  subject  motivation  to  be  assessed,  and  (3)  especially  when 
retesting  of  subjects  is  planned  or  desirable.  This  will  often  be  the 
case  in  residential  institutions.  Subject-present  scales,  on  the 
other  hand,  seem  most  appropriate  for  research  and  screening  purposes 
involving  one-time  evaluation  of  subjects.  The  present  data  recommend 
the  Physical  Self-Maintenance  Scale  and  the  Performance  Test  of 
Activities  of  Daily  Living  most  highly. 

Future  research  should  emphasize  the  collection  of  good  normative 
data  with  these  scales  and  on  extending  studies  like  the  present  one 
to  female  populations. 
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CHAPTER  I 
INTRODUCTION 

Objective 

The  overall  objective  of  the  proposed  study  was  to  promote  the 
quality  of  future  gerontological  research  by  identifying  some 
behavioral  measurement  instruments  which  possess  adequate  psychometric 
characteristics  to  justify  continued  use.  Only  scales  which  are 
clearly  applicable  to  use  with  neurologically  impaired  geriatric 
patients  were  chosen.  The  project  clarifies  the  appropriateness  of 
the  scales  for  both  individual  assessment  of  such  patients  and  treat- 
ment outcome  research  with  groups  of  such  patients. 

Background 

During  the  twentieth  century,  there  has  been  considerable  pro- 
gress on  such  social  frontiers  as  medical  science,  medical  technology, 
and  occupational  safety.  This  has  been  accompanied  by  a dramatic 

growth  in  the  ranks  of  the  elderly,  both  in  terms  of  their  numbers  and 
as  a percentage  of  the  American  population.  Between  1870  and  1970, 
the  number  of  Americans  age  65  and  over  increased  from  1.2  million  to 
approximately  20  million.  The  portion  of  the  total  population  they 
comprise  grew  from  2.9%  to  9.9%.  Estimates  of  the  number  of  elderly 
Americans  in  the  year  2000  range  from  28.8  million  (Brotman,  1973)  to 
33  million  (Butler,  1969).  The  proportion  of  elderly  veterans  is  also 
growing.  In  1975,  13%  of  American  veterans  were  age  60  or  over,  but 
this  statistic  has  been  projected  to  reach  39%  by  the  year  2000 
(Kirchoff,  1977). 
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While  the  majority  of  elderly  persons  are  in  relatively  good 
health  and  are  functioning  independently,  advancing  age  is  still 
associated  with  decline  of  function.  Of  those  75  and  over,  23.7%  are 
unable  to  carry  on  major  activities,  as  opposed  to  9.7%  of  those 
between  65  and  75  years  of  age  (Busse,  1969).  Almost  1%  of  all  those 
over  65  are  in  some  type  of  mental  institution.  An  additional  3%  are 
in  other  types  of  institutions,  such  as  nursing  homes,  chronic  disease 
hospitals,  and  homes  for  the  aged  (Redick,  Kramer,  & Taube,  1973). 

Among  the  roughly  one  million  elderly  persons  who  are  institu- 
tionalized, organic  brain  disorders  are  extremely  widespread,  though 
their  prevalence  in  different  institutions  varies  according  to  certain 
facility-specific  selection  factors.  Thus,  a study  of  state  hospitals 
in  North  Carolina  indicated  that,  of  those  admitted  for  the  first  time 
after  age  65,  79%  had  a diagnosis  of  acute  or  chronic  brain  syndrome 
(Whanger , 1971),  whereas  only  42%  of  elderly  residents  in  private 
psychiatric  facilities  had  organic  brain  syndromes  (Redick,  Kramer,  & 
Taube,  1973). 

In  response  to  these  population  trends  and  institutional 
parameters,  geriatricians  are  becoming  increasingly  innovative  in 
their  search  for  new  and  effective  treatments.  Examples  range  from 
drugs  to  planned  psychosocial  environments.  Notable  among  recent 
developments  are  reality  orientation  therapy  (Folsom,  1968),  Gerovital 
(Jarvik  & Milne,  1975),  and  hyperbaric  oxygenation  (Eisner,  1975; 
Thompson,  1975).  The  effectiveness,  and  in  some  cases  even  the  use, 
of  many  recently  developed  therapies  is  still  quite  controversial. 
These  debates,  which  are  carried  on  by  concerned  theorists. 
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administrators,  and  clinicians  by  way  of  scientific  journals,  should 
be  based  on,  and  ultimately  resolved  with,  sound  empirical  data. 

Psychometric  Criteria  for  Outcome  Measures 

"Sound"  data  are  those  which  have  been  obtained  from  measurement 
techniques  or  instruments  of  demonstrated  reliability  and  validity. 
Why  is  this  an  important  requirement?  Use  of  data-gathering  methods 
without  these  characteristics  creates  a situation  in  which  it  cannot 
be  said  with  certainty  what  the  data  represent--the  intended  variable, 
some  systematic  intervening  variable,  experimenter  or  rater  bias, 
methodological  or  statistical  artifact,  or  random  error.  When  the 
results  of  studies  which  utilized  instruments  of  unknown  reliability 
or  validity  are  used  to  formulate  policy  and  plan  future  investments 
of  institutional  resources,  the  risk  of  being  misled  by  speciously 
"empirical"  data  can  be  very  great.  This  pitfall  is  best  avoided  by 
choosing  dependent  measures  whose  psychometric  characteristics  are 
already  known  to  be  acceptable.  In  geriatrics,  just  as  in  other 
fields,  the  speedy  development  of  effective  and  efficient  health  care 
delivery  systems  requires  a dependable  technology  for  accurate  program 
evaluation. 

Reliability  refers,  in  general,  to  the  consistency  of  scores 
obtained  with  a measurement  instrument.  A test  should  be  able  to 
yield,  with  a single  application,  a trustworthy  estimate  of  the  "true" 
level  of  the  studied  variable  at  the  time  of  measurement.  The 
presence  of  excessive  score  variability  (inconsistency)  brings  into 
serious  question  the  test's  ability  to  do  this.  A si ne  qua  non  for 
scales  intended  for  use  in  research  of  any  kind,  adequate  reliability 
is  absolutely  essential  if  the  scale  is  to  be  sensitive  to  changes  in 
the  measured  variable. 
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A scale's  reliability  must  be  estimated  by  comparing  the  results 
of  multiple  measurements  of  a sample  for  agreement.  Several  methods 
exist  for  making  such  comparisons,  each  addressing  different  sources 
of  error  variance.  Test~retest  rel i abi 1 i ty  measures  error  due  to 
time,  but  confounds  time  with  possible  practice  effects,  and  with  real 
change  over  time.  Alternate  forms  of  many  tests  are  available  to  help 
avoid  practice  effects  in  test-retest  comparisons,  but  use  of 
alternate  forms  may  present  the  difficulty  of  unequal  content 
sampling.  By  computing  spl i t-hal f rel iabil ity  on  data  gathered  within 
a test  situation,  the  effect  of  content  sampling  can  be  isolated. 
Interrater  re 1 i ab i 1 i ty  is  the  agreement  of  two  or  more  observers  who 
simultaneously  score  the  same  sequence  of  events.  High  interrater 
reliability  suggests  that  the  scores  obtained  by  one  rater  would  be 
very  similar  to  scores  obtained  by  any  other  comparably  trained 
rater(s)  who  had  rated  the  same  events.  In  such  cases,  duplication  of 
effort  (through  multiple  rating)  is  less  necessary,  thus  making 
assessment  more  efficient  and  accurate.  Interrater  reliability  has 
been  termed  "the  primary  reliability  concern  in  applied  behavior 
research"  (Walls,  Werner,  Bacon,  & Zane,  1977,  p.  83). 

Validity,  like  reliability,  is  not  a unitary  concept.  It  refers, 
in  general,  to  the  extent  to  which  a test  actually  measures  the 
particular  construct  it  is  supposed  to  measure.  It  is  also  related  to 
the  amount  of  confidence  with  which  a test  user  may  apply  an 
individual's  test  score  in  making  a decision  about  that  individual. 
It  is  important  to  remember,  however,  that  questions  of  a scale's 
validity  can  be  meaningfully  explored  only  after  adequate  reliability 
has  been  demonstrated.  After  reliability  has  been  achieved,  then 
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begins  the  slower  process  of  assessing  the  scale's  several  validities 
(Standards  for  Educational  and  Psychological  Tests,  1974). 

To  demonstrate  content  validity,  one  must  show  that  the  behaviors 
elicited  in  testing  constitute  a representative  sample  of  all  possible 
behaviors  in  the  specified  performance  domain.  This  should  not  be 
confused  with  relatively  worthless  judgements  of  appearances  of  rele- 
vance that  have  in  the  past  been  termed  "face  validity."  Construct 
val idity  is  of  two  types--convergent  and  discriminant.  Convergent 
validity  is  demonstrated  when  the  test  scores  on  a sample  of  subjects 
correlate  highly  with  other  measures  of  the  same  construct  made  on  the 
same  subjects.  Discriminant  validity  requires  that  the  results  of 
this  scale  vary  independently  of  the  subjects'  scores  on  tests 
designed  to  measure  different  constructs  or  variables  (Campbell  & 
Fiske,  1959).  There  are  also  two  types  of  criterion- related  validity. 
Concurrent  criterion-related  validity  is  the  extent  to  which  an 
individual's  test  score  is  useful  in  estimating  his  present  standing 
on  another  variable.  Predictive  criterion-related  validity,  however, 
involves  a time  element,  and  indicates  the  extent  to  which  the 
individual's  future  level  on  some  criterion  can  be  predicted  from  a 
knowledge  of  prior  test  scores.  Important  in  this  regard  is  how  much 
the  knowledge  of  test  scores  improves  prediction  to  the  criterion  over 
simply  using  population  base  rates.  The  magnitude  of  this  improve- 
ment, often  called  i ncremental  val i di ty , depends  not  only  on  the  power 
of  the  test,  but  also  on  the  behavior's  base  rate  (Meehl  & Rosen, 
1955).  The  closer  the  base  rate  of  a certain  characteristic  or 
behavior  in  a given  population  is  to  .5,  the  easier  it  will  be  to 
construct  a test  which  will  significantly  improve  the  clinician's 
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ability  to  predict  the  emergence  of  that  characteristic  or  behavior. 
However,  prediction  of  events  which  are  already  known  to  have  very  low 
base  rates  (such  as  changes  of  eye  color),  or  very  high  base  rates 
(such  as  dying  by  age  90),  will  be  improved  very  little  by  adding  test 
information  to  base  rate  information. 

Geriatric  Treatment  Program  Outcome  Research 

A major  compilation  of  rating  scales  for  geriatric  research,  with 
an  emphasis  on  psychopharmacological  research  (Salzman,  Kochansky,  & 
Shader,  1972),  listed  117  such  instruments,  including  42  behavior 
rating  scales.  Since  the  publication  of  that  review,  dozens  of 
additional  scales  have  appeared,  apparently  resulting  from  the 
burgeoning  interest  in  aging.  Unfortunately,  data  on  the  validity, 
and  especially  on  the  reliability,  of  the  vast  majority  of  these 
scales  are  often  either  indequate  or  nonexistent. 

It  is,  of  course,  the  test  user1 s responsibility  to  marshal 
evidence  in  support  of  any  instrument  to  be  used  to  help  make 
decisions  (Standards  for  Educational  and  Psychological  Tests,  1974,  p. 
32).  Yet,  how  can  the  individual  gerontological  researcher  hope  to 
choose  wisely  from  the  plethora  of  available  scales?  The  most  recent 
Menta1  Measurements  Yearbook  (Buros,  1972)  reviews  none  of  Salzman' s 
(et  al.,  1972)  42  behavior  rating  scales,  nor,  of  course,  any  of  those 
which  have  appeared  since  1972.  The  Reality  Orientation  Program  at 
the  Tuscaloosa  Veterans  Administration  (VA)  Hospital  supplied  a brief 
annotated  list  (List  of  Research  and  Evaluation  Tools , 1976)  of  rating 
scales,  but  without  critical  comments  or  psychometric  data.  Walls, 
Werner,  Bacon,  and  Zane  (1977)  list  166  behavior  checklists  along  with 
"selective  descriptive  notes."  Many  of  these  scales  may  be  applicable 
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to  elderly  populations,  although  they  are  not  marked  as  such.  Each 
scale  is  rated  for  objectivity,  and  the  availability  of  reliability 
and  validity  information  is  described  (i.e.,  yes  or  no).  Actual 
information  as  to  the  reliability  and  validity  of  each  scale,  however, 
is  not  included. 

Many  clinicians  and  researchers  have  need  of  good  dependent 
measures  for  geriatric  program  research  and  clinical  assessment. 
Most,  however,  probably  do  not  have  the  time,  resources,  or  training 
to  search  out  and  evaluate  what  reliability  and  validity  relationships 
have  been  demonstrated.  Also,  it  would  be  quite  wasteful  for  each 
interested  party  to  be  forced  to  perform  the  same  search  and  evalu- 
ation as  others  before  them.  The  inevitable  product  of  this 
unfortunate  situation  is  that  much  serious  research  is  conducted  using 
measures  of  unknown  dependability.  Many  choices  of  measurement 
instruments  are  certainly  haphazard,  and  this  can  only  lead  to  hap- 
hazard findings. 

It  may  be  argued  that  accurate  measurement  of  discrete,  well- 
defined  behaviors  is  all  the  user  should  require  of  a behavioral 
rating  scale,  and  that  the  exploration  of  sophisticated  questions  of 
reliability  and  validity  is  either  superfluous  or  inappropriate. 
After  all,  the  correlation  of  Scale  X with  Scale  Y may  be  of  little 
interest  as  long  as  Scale  X data  can  indicate  a patient's  ability  to, 
for  example,  wash  his  face.  Indeed,  if  the  user  is  truly  interested 
only  in  the  precise  behaviors  actually  rated,  validation  efforts  are 
quite  unnecessary  (although  reliability  is  still  required).  While 
such  usage  of  behavioral  rating  scales  can  be  perfectly  legitimate, 
however,  it  is  certain  that  behavioral  rating  scales  of  the  type 
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discussed  here  are  usually  used  in  order  to  obtain  an  estimate  of  a 
patient's  competence  in  certain  areas  of  performance.  Each  "area" 
subsumes  many  discrete,  but  functionally  related,  motor  behaviors. 
While  any  reliable  scale  can  supply  accurate  information  about  the 
specifically  rated  behaviors,  this  information  can  be  used  to  make 
inferences  about  the  patient's  competence  in  that  cl  ass  of  behaviors 
only  if  the  scale's  validity  has  been  demonstrated. 

Real ity  Orientation  Therapy  Outcome  Research 

One  body  of  literature  in  which  use  of  inadequate  dependent 
measures  is  all  too  common  is  reality  orientation  therapy  (ROT)  out- 
come research.  Five  such  studies  will  be  reviewed  here  to  illustrate 
the  problems  discussed  above.  ROT  was  developed  at  the  Topeka  and 
Tuscaloosa  Veterans  Administration  (VA)  hospitals  for  the  treatment  of 
geriatric  inpatients  with  a moderate  to  severe  degree  of  memory  loss, 
confusion,  and  disorientation.  It  is  designed  to  combat  the  process 
of  cognitive  deterioration  in  two  ways — the  patient  is  continually 
stimulated  through  the  repeated  presentation  of  fundamental  informa- 
tion, and  he  is  placed  in  a group  where  he  meets  and  competes  with 
other  patients,  thus  forcing  him  out  of  his  isolation  and  back  into 
his  environment  (Stephens,  1970). 

Letcher,  Peterson,  and  Scarbrough  (1974)  devised  a four  level 
classification  of  nursing  care  requirement  to  assess  the  effectiveness 
of  the  geriatric  rehabilitation  program  at  the  Tuscaloosa  VA  hospital. 
Although  24-hour  and  classroom  ROT  was  a central  element  of  this 
program,  the  absence  of  control  data  made  it  impossible  to  isolate  the 
role  of  ROT  from  that  of  other  program  components.  Subjects  were  125 
male  inpatients,  whose  mean  age  was  82.8  years.  Eighty  percent  were 
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diagnosed  as  having  organic  brain  syndrome  secondary  to  cerebral 
arteriosclerosis,  and  another  15%  had  suffered  strokes.  All  subjects 
also  had  at  least  one  additional  major  medical  problem.  Treatment 
duration,  which  was  recorded  but  not  manipulated  or  controlled,  ranged 
from  2 to  56  months.  The  "experiment"  involved  no  control  groups,  no 
assessment  of  the  classification  scheme's  reliability,  and  no  tests  of 
significance.  Rather,  percentages  of  the  sample  which  required  each 
level  of  nursing  care  before  and  after  the  individual's  ROT  term  are 
supplied.  That  is,  pre-ROT  evaluation  by  the  treatment  team  rated  .8% 
of  the  sample  at  level  I (minimal  care),  8.0%  at  level  II,  23.2%  at 
level  III,  and  68.0%  at  level  IV  (total  care).  Corresponding  post-ROT 
figures  were  12.0%  14.4%,  24.8%,  and  48.8%.  Although  the  authors 
confidently  concluded  that  "on  the  basis  of  the  current  data,  reality 
orientation  as  part  of  a total  rehabilitation  program  can  be  evaluated 
as  highly  successful"  (p.  803),  it  is  clear  that,  in  the  absence  of 
reliability  information,  control  data,  or  appropriate  tests  of 
statistical  significance,  such  claims  were  premature  and  poorly 
grounded. 

Barnes  (1974)  studied  two  male  and  four  female  nursing  home 
residents  with  a mean  age  of  81  years.  All  six  were  diagnosed  as 
having  senile  dementia,  exhibiting  confusion,  memory  loss,  and  dis- 
orientation. Subjects  were  assessed  four  times — initially,  after  a 
four-month  control  period,  immediately  following  an  intensive,  six- 
week  ROT  treatment  period,  and  one  week  after  termination  of  ROT. 
Data  were  obtained  with  a questionnaire  consisting  of  (a)  23  questions 
for  each  subject  to  answer  as  a measure  of  learning,  and  (b)  five 
questions  about  the  subject's  behavior,  to  be  completed  by  the  nursing 
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director.  Despite  the  author's  acknowledgement  that  the  reliability 
and  validity  of  the  questionnaire  were  not  assessed,  he  was  not 
deterred  from  administering  it  repeatedly,  interpreting  its  results, 
and  submitting  them  for  publication.  Results  of  the  questionnaire 
indicated  insignificant  changes  during  the  control  and  treatment 
periods  in  the  subjects'  learning  and  ward  behavior.  Follow-up  test- 
ing after  one  week  showed  a significant  decline  in  the  subjects' 
learning  but  no  behavioral  changes.  In  support  of  ROT  effectiveness, 
however,  the  author  then  resorted  to  nurses'  anecdotal  reports,  which 
were  highly  favorable.  In  fact,  it  was  quite  meaningless  to  infer 
anything  about  the  effectiveness  of  ROT  from  the  results  of  a measure 
with  unknown  reliability  and  validity.  It  appears  that,  at  best,  the 
questionnaire  may  have  been  insensitive  to  the  positive  changes  which 
the  nurses  claimed  to  have  observed. 

Brook,  Degun,  and  Mather  (1975)  studied  eight  male  and  ten  female 
psychogeriatric  inpatients.  The  patients,  whose  mean  age  was  73.3 
years,  had  been  continuously  hospitalized  for  an  average  of  1.9  years. 
Subjects  were  assessed  using  an  undescribed  "rating  scale  of  intel- 
lectual and  social  functioning."  The  raters  and  rating  procedure  were 
not  described,  and  factors  such  as  reliability  and  validity  were  not 
even  mentioned.  On  the  basis  of  initial  ratings,  subjects  were 
divided  into  three  groups  according  to  level  of  functioning.  Half  of 
each  group  served  as  control  subjects.  Experimental  subjects  took 
part  in  intensive  classroom  ROT  for  eight  weeks,  while  control  sub- 
jects were  simply  brought  to  the  ROT  room  daily  and  left  to  occupy 
themselves.  Weekly  testing  suggested  that  all  subjects  made  initial 
gains,  presumably  due  to  the  novelty  (Hawthorne  effect).  Thereafter, 
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however,  control  subjects  deteriorated  back  to  baseline,  while  experi- 
mental subjects  continued  to  progress.  By  the  eighth  week,  within 
each  condition,  the  groups  who  were  initially  rated  as  "high  function- 
ing" were  performing  slightly  better  than  "middle  functioning"  groups, 
who,  in  turn,  bettered  the  "low  functioning"  groups.  Again,  there  is 
no  reason  to  have  confidence  in  these  findings  and  conclusions,  since 
the  quality  of  the  data  is  totally  unknown. 

Harris  and  Ivory  (1976)  compared  two  groups  of  29  female  geri- 
atric patients  at  the  Florida  State  Hospital.  The  ROT  group  had  a 
mean  age  of  66.6  years  and  had  been  continuously  hospitalized  for  an 
average  of  24.6  years.  The  control  group  averaged  71.1  years  of  age 
and  23.0  years  of  hospitalization.  Most  subjects  carried  diagnoses  of 
chronic  organic  brain  syndrome,  while  a few  were  mentally  deficient. 
All  subjects  received  traditional  hospital  treatment,  including  chemo- 
therapy, music  therapy  and  occupational  therapy.  Experimental 
subjects  additionally  received  24-hour  ROT,  administered  by 
psychiatric  aides.  After  an  unstated  period  of  treatment,  all  sub- 
jects were  rated  using  the  Florida  State  Hospital  Patient  Behavior 
Rating  Sheet,  which  addresses  a variety  of  ward  behaviors  and  "verbal 
orientation"  behaviors,  as  well  as  recording  the  rater's  own 
impression  of  the  subject  in  several  behavioral  spheres.  Reliability 
data  are  not  supplied.  It  is  curious  that,  although  all  subjects  were 
rated  simultaneously  and  independently  by  two  raters,  and  eight  pairs 
of  raters  participated  in  the  study,  interrater  reliability  was  not 
reported.  Also  disconcerting  is  the  fact  that  each  aide  who  worked 
with  a subject  was  also  a rater,  and  all  aides,  or  course,  knew  which 
patients  were  assigned  to  each  group.  This  already  enormous  risk  of 
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rater  bias  (O'Leary,  Kent,  & Kanowitz,  1975)  was  amplified  by  part  of 
the  rating  procedure,  which  required  the  aide/rater  to  elicit  and  rate 
scorable  responses.  Parenthetically,  no  treatment  effect  on  ward 
behavior  emerged,  although  ROT  subject  performed  significantly  better 
on  six  of  nine  verbal  orientation  tasks. 

Citrin  and  Dixon  (1977)  reported  perhaps  the  most  rigorous 
attempt  to  assess  the  effectiveness  of  ROT.  At  a relatively  large 
geriatric  institution  in  Lincoln,  Nebraska,  ROT  was  instituted  on  one 
ward,  and  withheld  from  another.  The  comparability  of  the  two  wards 
was  not  described,  however,  and  it  is,  therefore,  impossible  to  know 
whether  assignment  of  patients  to  wards  (and,  therefore,  to  treat- 
ments) was  random.  Twelve  patients  of  the  ROT  ward,  mean  age  84 
years,  and  13  patients  on  the  control  ward,  mean  age  83  years,  served 
as  subjects.  All  showed  disorientation,  but  none  were  nonambulatory, 
blind,  or  deaf.  Two  measures  were  administered  to  all  subjects  before 
and  after  a two-month  period,  during  which  the  experimental  subjects 
received  24-hour  and  classroom  ROT.  One  measure  used  was  the 
Geriatric  Rating  Scle  (GRS)  (Plutchik  et  al.,  1970),  a well 
established,  31- item  behavioral  scale  whose  reported  reliability 
ranges  from  .87  to  .94  (see  CHAPTER  II  below).  Also  used  was  the 
unpublished  Reality  Orientation  Information  Sheet  (ROIS),  developed  in 
1973  at  the  Tuscaloosa  V.A.  hospital.  The  ROIS  includes  20  orienta- 
tion questions  to  be  asked  of  the  subject  and  five  opinion  questions 
to  be  asked  of  the  ward  aide  about  the  subject.  These  authors 
obtained  an  interrater  reliability  of  .98  with  the  ROIS,  based  on  a 
sample  of  ten  patients.  Experimental  data  were  reduced  to  four  scores 
for  each  test  mean  pre-  and  post-treatment  scores  for  the  experi- 
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mental  and  control  groups.  An  elaborate,  and  wholly  inappropriate 
(Campbell  & Stanley,  1963,  pp.  22-23)  array  of  t-tests  was  unfor- 
tunately used  to  test  for  treatment  effects,  rather  than  computing 
gain  scores  (post-treatment  scores  minus  pre-treatment  scores),  upon 
which  t-tests  could  be  meaningfully  conducted.  The  array  method 
yielded  "significant"  results  with  the  ROIS  data  but  not  with  the  GRS 
data,  from  which  it  was  concluded  that  ROT  may  have  a positive  impact 
upon  cognitive  orientation  but  little  effect  upon  behavioral  function- 
ing. 

Purpose  of  the  Study 

In  response  to  the  unfortunate  quesswork  now  involved  in  choosing 
a behavioral  dependent  measure  for  geriatric  therapy  outcome  research, 
the  present  study  directly  compared  a small  number  of  those  scales 
which  showed  the  most  promise  for  successful  refinement  and  popular 
acceptance.  Reliability  and  validity  were  rigorously  assessed,  and 
the  advantages  and  disadvantages  peculiar  to  each  scale  were 
identified.  On  the  basis  of  this  newly  generated  information, 
straightforward  recommendations  of  measures  for  specific  purposes  are 
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CHAPTER  II 
METHOD 

Measures 

As  mentioned  above,  dozens  of  behavioral  rating  scales  have  been 
published,  many  of  which  are  potentially  applicable  to  geriatric 
research.  In  order  to  select  a practical  number  of  scales  to  be 

studied  from  the  large  number  available,  broad  guidelines  were 
developed  which  proved  helpful,  but  neither  exhaustive  nor  absolute. 
The  major  selection  criterion  specified  scales  designed  to  assess  an 
individual's  level  of  function  in  behavioral  domains  necessary  for 
normal,  independent  living.  Such  domains,  often  called  activities  of 
daily  living  (ADL),  include  eating,  dressing,  grooming,  ambulation, 
use  of  appliances,  etc.  The  Bender-Gestalt  Test  (Bender,  1938),  by 
way  of  contrast,  did  not  meet  this  criterion.  Although  it  is  well 
known  as  a diagnostic  instrument  and  is  of  demonstrated  usefulness  in 
differentiating  between  diagnostic  groups  (Lacks,  Harrow,  Colbert,  & 
Levine,  1970),  it  does  not  directly  address  behavioral  competence  in 
daily  functioning.  A more  direct,  behavioral  assessment  technology 

offers  the  advantage  of  immediate  operationalization  of  concepts,  and 
clear,  clinically  useful  therapeutic  inferences  from  identified 
deficits  (Lindsley,  1964;  Rebok  & Hoyer,  1977).  It  has  been  demon- 
strated, as  well,  that  behavioral  assessment,  when  compared  with 
global,  subjective  ratings,  is  relatively  resistant  to  raters' 
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expectancy  biases  (Kent,  O'Leary,  Diament,  & Dietz,  1974;  Schuller  & 
McNamara,  1976). 

Scales  to  be  evaluated  had  to  appear  capable  of  measuring  highly 
discrepant  levels  of  functional  competence,  from  normal  to  "senile" 
behavior.  Procedure  length  must  not  be  prohibitive,  as  the  Camel ot 
Behavioral  Checklist  (Foster,  1974),  with  399  items,  or  even  the 
Geriatric  Profile  (Evenson,  1971),  with  74.  Available  psychometric 
data  had  to  be  at  worst,  equivocal.  Of  course,  truly  damaging  data 
are  seldom  published.  Finally,  there  had  to  be  no  indication  that  the 
scale  had  already  met  with  overt  rejection  by  potential  users. 

The  five  scales  selected  for  evalation  by  these  criteria  were  the 
Geriatric  Rating  Scale,  the  Physical  Self-Maintenance  Scale,  the  Rapid 
Disability  Rating  Scale,  the  Minimal  Social  Behavior  Scale,  and  the 
Performance  Test  of  Activities  of  Daily  Living. 

The  Geriatric  Rating  Scale  (GRS)  (Plutchik,  Conte,  Lieberman, 
Bakur,  Grossman,  & Lehrman,  1970),  a revision  of  the  Stockton  GRS 
(Meer  & Baker,  1966),  consists  of  31  items,  each  of  which  is  scored  0, 
1,  or  2.  Excellent  correlational  interrater  reliability  has  been 
obtained,  ranging  from  .87  (N=86;  Plutchik  et  al.,  1970)  to  .94  (N=20; 
Plutchik  & Conte,  1972),  and  the  correlation  between  ratings  obtained 
approximately  one  year  apart  was  .65  (N=132;  Plutchik  & Conte,  1972). 
Scores  on  the  GRS  differentiated  between  hospitalized  geriatric  and 
hospitalized  non-geriatric  patients  (N=243,  pc.OOl;  Plutchik  et  al . , 
1970)  and  between  organically  and  functionally  impaired  psychogeri- 
atric  patients  (N=73,  p<.001;  Dastoor,  Norton,  Boillat,  Minty, 
Papadopoulou,  & Muller,  1975).  Scores  have  shown  high  agreement  with 
clinical  staff  ratings  (Plutchik  et  al.,  1970),  and  may  have  prog- 
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nostic  value  in  terms  of  discharge  (Plutchik  & Conte,  1972).  As  a 
result,  the  GRS  was  adopted  by  the  Psychopharmacology  Research  branch 
of  NIMH  for  inclusion  in  its  Assessment  Battery  for  Nurses,  and  one 
study  (Miller  & Parachek,  1974)  employed  the  GRS  as  the  criterion 
measure  against  which  to  validate  a newer  geriatric  rating  scale. 

The  GRS  has  also  been  factor  analyzed  (N=370;  Smith,  Bright,  & 
McClosky,  1977),  yielding  factors  which  have  been  named  Withdrawal/ 
Apathy,  Antisocial  Disruptive  Behavior,  and  Deficits  in  Activities  of 
Daily  Living.  Findings  of  significant  sex  differences  on  the  latter 
two  factors  only  have  helped  to  explain  some  of  the  equivocal  sex 
results  of  earlier  research.  Significant  sex  differences  also  mean 
that  the  present  project,  which  studied  only  males,  yielded  informa- 
tion general izable  only  to  males.  In  the  present  study,  the  three  GRS 
factors,  as  described  in  Smith  et  al.  (1977),  served  as  autonomous 
units  of  analysis  in  lieu  of  an  overall  GRS  score. 

Based  on  the  advanced  psychometric  development  of  the  GRS  (a 
quality  which  the  reader  will  better  appreciate  after  reviewing  the 
following  scales),  it  was  expected  that  this  scale  would  fare  quite 
well  in  the  present  study. 

The  Physical  Self-Maintenance  Scale  (PSMS),  adapted  by  Lawton  and 
Brody  (Lawton,  1971,  1972)  from  a scale  used  at  Langley- Porter 
(Lowenthal,  1964),  is  a Guttman-scaled  set  of  observer  ratings  of 
competence  in  dressing,  grooming,  eating,  bathing,  locomotion,  and 
toileting.  An  individual's  score  ranges  from  zero  to  six,  one  point 
being  given  for  each  function  in  which  the  most  independent  rating  is 
checked.  It  was  considered  that  an  enlarged  scoring  procedure  might 
make  this  scale  more  sensitive  to  degrees  of  impairment.  Lawton  and 
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Brody  (1969)  reported  that  independent  ratings  of  36  patients  by  two 
licensed  practical  nurses  correlated  .87,  and  ratings  of  14  other 
patients  by  two  research  assistants  correlated  .91.  Extensive 
validity  information  for  this  scale  has  not  been  reported,  but  data 
are  available  differentiating  mentally  impaired  aged  from  healthier 
applicants  to  a home  for  the  aged  (Lawton,  1972).  Surveying  data  from 
343  cases,  Lawton  (1971)  found  mean  scores  to  drop  quickly  as  one 
moves  from  old  age  home  applicants  (4.1),  to  elderly  mental  hospital 
admissions  (3.5),  to  protective  custody  patients  (2.0). 

The  Rapid  Disability  Rating  Scale  (RDRS)  (Linn,  1967)  applies  a 
three-point  "frequency  of  occurrence  or  degree  of  severity"  scale  to 
the  assessment  of  performance  in  sixteen  categories  of  ADL  and  ward 
behavior.  Scores  thus  range  from  16  to  48,  higher  scores  reflecting 
greater  disability.  Nurse  completion  of  the  RDRS  for  one  patient  is 
reported  to  require  only  two  minutes.  Reported  interrater  reliability 
estimates  range  from  .894  (N=1000;  Linn,  Gurel , & Linn,  1977)  to  .913 
(N-20;  Linn,  1967).  Based  on  a sample  of  1000  male  nursing  home 
residents,  normative  data  (mean  score  and  percentage  of  sample  showing 
the  disability)  are  available  for  each  of  the  scale's  sixteen  items 
(Linn  et  al . , 1977).  RDRS  scores  have  been  found  to  correlate  sig- 
nificantly with  the  number  of  previous  hospitalizations,  the  subse- 
quent death  rate,  and  with  physicians'  ratings  of  medical  status 
change  in  elderly  patients.  The  RDRS  has  been  used  frequently  by  the 
author  and  her  associates  in  exploratory  research  on  nursing  home 
program  effectiveness  and  nursing  home  population  characteristics 
(Carmichael  & Linn,  1974;  Greenwald  & Linn,  1972;  Linn  & Gurel,  1969; 
Orgren  & Linn,  1971). 
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The  Minimal  Social  Behavior  Scale  (MSBS)  (Lawton,  1971,  1972)  is 
an  adaptation  of  a scale  of  the  same  name  by  Farina,  Arenberg,  and 
Guskin  (1957).  It  requires  administration  by  a clinician  and  presents 
the  subject  with  twelve  basic  social  stimuli,  such  as  a greeting  and  a 
chance  to  do  a favor,  the  responses  to  which  are  scored  for  social 
appropriateness.  One  of  the  25  items  which  make  up  this  scale 
involved  administering  the  entire  Mental  Status  Questionnaire  (Kahn, 
Goldfarb,  Pollack,  & Peck,  1960).  The  geriatric  norms  available  for 
this  scale  indicate  that  "the  average  score  for  relatively  intact 
geriatric  mental  patients  [is]  22,  whereas  the  average  score  for 
impaired  and  uncooperative  patients  [is]  15"  (Lawton,  1971,  p.  474). 
The  reliability  of  this  scale  is  unknown,  although  its  30-item  and 
32-item  precursors  (Mangun  & Webb,  1956;  Farina  et  al.,  1957) 
reportedly  attained  interrater  reliabilities  of  .96  (N=15)  and  .95 
(N=35) , respectively.  Nei ditch  and  White  (1976)  included  Lawton's 
MSBS  in  a test  battery  for  repeated  assessment  of  an  inpatient  psycho- 
geriatric  sample  (N=36).  Reliability  was  not  assessed.  Inclusion  of 
the  MSBS  improved  the  clinical  team's  ability  to  predict  five-month 
outcome  substantially,  although  the  Mental  Status  Questionnaire 
accounted  for  most  of  this  gain.  This  finding  was  consistent  with 
that  of  Markson  and  Levitz  (1973).  Dastoor  (et  al.,  1975)  who  also 
did  not  assess  reliability,  found  the  MSBS  to  distinguish  between 
organically  and  functionally  impaired  geriatric  patients. 

The  Performance  Test  of  ADL  (PADL)  (Kuriansky,  Gurland,  Fleiss,  & 
Cowan,  1976;  Kuriansky  & Gurland,  1976)  is  unique  in  that  it  requires 
subjects  to  actually  demonstrate  for  the  test  administrator  selected 
activities  of  daily  living.  Sixteen  tasks  are  included  which  may  be 
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roughly  subsumed  as  follows:  eating,  dressing,  grooming  and  toilet- 
ing, and  use  of  applicances.  Each  of  the  sixteen  tasks  has  been 
divided  into  its  gross  muscular  components  for  scoring.  For  example, 
"hair  combing"  consists  of  four  distinct,  scorable  elements:  (a) 
takes  comb  in  hand,  (b)  grasps  comb  properly,  (c)  brings  comb  to  hair, 
and  (d)  makes  combing  motion.  Developed  as  part  of  a cross-national, 
interdisciplinary  study  of  psychogeriatric  assessment  technology,  the 
PADL  has  been  reported  to  have  high  interrater  reliability  (.902; 
Kuriansky  & Gurland,  1976),  but  neither  the  reliability  sample  size 
nor  the  details  of  administration  and  computation  are  reported. 
Moderate  agreement  with  verbal  ratings  by  family  members  (N=45, 
K=.37),  and  some  prognostic  utility  have  also  been  reported  (Kuriansky 
et  al. , 1976). 

Subjects 

This  project  involved  samples  of  subjects  from  five  subsets  of 
one  population,  male  veterans.  Four  samples  were  gathered  within  the 
Gainesville,  Florida,  Veterans  Administration  Medical  Center  (GVAH) , 
and  one  from  among  veterans  living  outside  the  hospital. 

Group  I consisted  of  30  male  inpatients,  age  60  and  over,  on  the 
neurology  ward  of  GVAH.  The  mean  age  of  this  group  was  69.37 
(SD=8.65),  and  age  ranged  from  60  to  87.  The  modal  diagnosis  was 
cardiovascular  accident  (CVA)  (45%),  followed  by  seizures  and  dementia 
(13%  each). 

Group  II  consisted  of  30  male  neurology  patients  under  the  age  of 
60,  on  the  neurology  ward  of  GVAH.  The  mean  age  of  this  group  was 
48.83  (SB=12.15),  and  age  ranged  from  19  to  59.  The  modal  diagnosis 
was  stroke  (CVA  or  transient  ischemic  attack)  (30%),  followed  by 
dementia  (10%). 
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Group  III  consisted  of  30  male  patients,  age  60  or  over,  on  the 
medical  wards  of  GVAH,  excluding  psychiatry,  neurology,  and  surgery. 
The  mean  age  of  this  group  was  69.63  (SB=8.92),  with  a range  of  60  to 
87.  The  modal  diagnosis  was  mass,  either  benign  or  cancerous  (43%). 

Group  IV  consisted  of  20  male  psychiatric  inpatients  at  GVAH,  age 
60  or  over,  whose  diagnoses  did  not  include  organic  impairment.  The 
mean  age  of  this  group  was  63.25  (SD=3.49),  and  age  ranged  from  60  to 
73.  The  modal  diagnosis  was  neurotic  depression  (35%),  followed  by 
affective  pyschosis  (20%)  and  chronic  schizophrenia  (15%). 

Group  V consisted  of  30  relatively  healthy,  noninstitutional ized 
male  veterans,  age  60  and  over.  Retired  men  who  volunteer  at  GVAH 
provided  the  largest  group  of  this  sample  (43%),  followed  by  veterans 
visiting  outpatient  clinics  (36%).  The  mean  age  of  this  group  was 
72.17  (SD=7.70),  with  a range  of  61  to  86. 

For  Groups  I through  IV,  patients  who  met  the  stated  criteria 
became  eligible  as  participants  after  seven  days  of  residence  on  the 
ward.  This  gave  the  raters  time  to  become  familiar  with  the  subjects. 
Those  eligible  patients  who  agreed  to  participate  signed  a consent 
form  (see  APPENDIX  A).  Subject  recruitment  ceased  when  data  had  been 
collected  on  the  required  number  of  subjects. 

Subject  samples  were  defined  by  hospital  service,  rather  than  by 
specific  diagnosis,  because  these  scales  are  most  often  used  with 
populations  containing  a mixture  of  diagnoses,  all  falling  within  some 
broad  category  (e.g. , medical,  psychogeriatric).  The  disadvantage, 
that  this  method  yields  less  information  about  what  determines  a given 
subject's  score,  is  offset  by  the  advantage,  that  the  samples  thus 
obtained  are  more  representative  of  the  populations  with  which  these 
scales  will  be  used. 
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Procedure 

Interrater  Reliability 
Subjects 

Thirty  Group  I subjects  and  five  subjects  from  each  of  the  four 
remaining  groups  served  as  subjects  for  this  phase  of  the  project,  for 
a total  of  50  subjects. 

Procedure 

Every  subject  was  rated  twice  with  each  of  the  five  main  scales 
described  above  (RDRS,  PSMS,  GRS  3,  MSBS,  and  PADL).  Ward  nurses, 
full-time  nursing  aides,  medical  students,  medical  interns,  and 
medical  residents  served  as  raters  with  the  first  three  instruments 
(GRS,  PSMS,  RDRS),  and  the  subjects  from  Groups  I through  IV.  These 
raters  were  briefed  by  the  principal  investigator  (PI)  and/or  his 
assistant  concerning  the  scales  and  their  proper  use.  The  dual  rat- 
ings with  each  scale  were  simulatneous  and  independent,  as  actual 
subject  presence  was  unnecessary  for  the  completion  of  these  scales. 
Subjects  from  Group  V were  rated  by  their  children,  spouses,  and/or 
close  acquaintances,  after  being  familiarized  with  the  scales. 

The  principal  investigator  and  his  assistant  served  as  the  rating 
pair  for  the  remaining  two  instruments  (MSBS,  PADL),  as  these  required 
direct  administration.  The  assistant,  Marty  Ludwig,  was  a senior  at 
the  University  of  Florida  majoring  in  psychology,  who  continued  in  his 
role  as  assistant  after  graduating.  Each  subject  was  administered 
each  of  these  two  scales  only  once,  but  his  performance  on  each  scale 
was  scored  twice,  that  is,  simulatneously  and  independently  by  the  PI 
and  the  assistant.  Both  the  PI  and  the  assistant  conducted  actual 
administrations  during  the  course  of  data  collection,  which  proceeded 
from  June,  1979,  to  January,  1980. 
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Analysis 

This  study  is  the  first  assessment  of  the  interrater  reliability 
of  any  of  the  included  scales  by  anyone  other  than  the  scales' 
respective  authors.  Nine  scales  were  analyzed.  These  included  five 
main  scales  (RDRS,  RSMS,  GRS  Factor  3,  MSBS,  & PADL)  and  four 
additional  scales  (PSMSX,  a modification  of  the  PSMS ; MSQ,  the  Mental 
Status  Questionnaire  in  the  MSBS;  and  GRS  Factors  1 and  2 ).  For  each 
scale  or  factor,  a product-moment  correlation  coefficient  was  computed 
to  measure  the  agreement  between  two  independent  ratings  of  subjects. 
This  is  the  test's  interrater  reliability.  Correlational  measurement 
of  interrater  reliability  was  employed  because  item-agreement  methods 
(Repp,  Deitz,  Boles,  Deitz,  & Repp,  1976)  are  unreasonably  conserva- 
tive when  used  with  instruments  that  assess  several  different 
behaviors  (Walls  et  al.,  1977).  For  purposes  of  research  involving 
only  comparisons  of  group  means,  a correlation  of  .7  is  considered 
good,  and  findings  exceeding  .85  are  excellent.  When  a test  score  is 
to  be  used  in  individual  assessment  and  clinical  decision-making, 
however,  higher  standards  of  reliability  are  called  for.  Therefore, 
only  those  scales  which  demonstrated  interrater  reliability  of  .8  or 
greater  were  included  in  later  portions  of  this  study. 

Construct  Val idity 
Subjects 

The  remainder  of  Groups  II  through  V were  then  recruited  and 
assessed  during  the  period  of  January,  1980,  to  June,  1980. 

Procedure 

Data  were  generated  with  the  remaining  90  subjects  in  essentially 
the  same  manner  as  with  the  initial  50  subjects,  but  with  one  important 
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change.  Each  subject  was  rated  by  only  one  rater,  rather  than  two, 
the  scales'  interrater  reliability  having  already  been  established. 
Analysis 

All  data  (N=140)  collected  with  scales  which  reached  minimum 
reliability  standards  were  reanalyzed  and  a correlation  matrix 
computed  to  indicate  the  extent  of  agreement  between  scales.*  This 
provides  quantitative  evidence  for  the  extent  of  agreement  between  the 
scales,  or  the  scales'  convergent  constuct  validity  (Campbell  & Fiske, 
1959).  The  correlation  values  in  this  matrix  were  evaluated  for 
statistical  significance  utilizing  an  alpha  level  of  .05. 

It  was  to  be  hoped  that  scales  purporting  to  measure  competence 
in  highly  similar  behavioral  domains  would  produce  highly  correlated 
scores.  To  the  degree  that  results  conform  to  this  expectation,  the 
convergent  construct  validity  of  the  correlated  scales  is  supported. 
It  was  expected,  specifically,  that  the  MSBS,  which  is  designed  to  be 
sensitive  to  a slightly  higher  level  of  function  than  the  other 
scales,  would  correlate  less  highly  with  the  other  scales  than  would 
the  remaining  four  among  themselves. 


When  any  reliability  subject's  (N=50)  dual  ratings  produced  unequal 
scores  with  a given  scale,  one  was  randomly  chosen  and  used  in  this 
and  subsequent  analyses. 

#The  use  in  behavior  rating  scale  development  of  the  multitrait- 
multimethod  matrix  proposed  by  Campbell  & Fiske  (1959)  has  been 
endorsed  by  Johnson  & Bolstad  (1973). 


Criterion  Related  Val idity 
Analysis 
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The  five  subject  groups  vary  along  the  dimensions  of  age, 
hospital  versus  community  resisdence,  and  type  of  illness  or  reason 
for  hospitalization.  It  was  possible,  by  appropriately  clustering  the 
groups,  to  measure  the  scales'  sensitivities  to  each  of  these 
dimensions  (i.e.,  concurrent  criteria)  singly.  To  this  end,  four 
planned  comparisons  were  carried  out,  using  the  same  data  as  in  the 
construct  val idity  section  of  this  study.  In  other  words,  the  rele- 
vance of  each  concurrent  criterion  (according  to  which  subject  groups 
were  themselves  grouped)  to  competence  in  activities  of  daily  living, 
as  measured  by  each  reliable  scale,  was  assessed,  as  well  as  the 
relative  usefulness  of  each  reliable  scale  in  detecting  relevant 
differences  between  clusters  of  subject  groups.  In  each  comparison, 
Hotelling's  multivariate  t-squared  test  with  data  from  the  main  scales 
only  was  used  to  determine  the  presence  of  significant  differences, 
with  alpha  set  at  .05.  Findings  of  significance  with  this  procedure 
were  followed  by  least  significant  difference  (LSD)  multiple 
comparisons  to  locate  the  differences  more  precisely.  Applied  in  this 
manner,  use  of  this  relatively  powerful  post-hoc  technique  does  not 
inflate  experiment-wise  alpha  level. 

Compari son  one.  To  examine  the  sensitivity  of  the  scales  to  age 
differences  (which  were  here  confounded  with  generational  differences 
(Schaie,  1977))  among  neurology  patients,  Groups  I and  II  were 
compared  with  one  another  on  the  basis  of  data  from  the  five  main 
scales.  It  was  hypothesized  that  older  subjects  (Group  I)  would  show 
more  disability  than  younger  subjects  (Groups  II). 
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This  comparison  had  precedent  in  the  work  of  Plutchik  et  al. 
(1970),  who  compared  the  GRS  scores  of  50  geriatric  inpatients  with 
those  of  36  randomly  drawn  non-geriatric  inpatients.  The  hypothesis 
that  age  would  be  associated  with  greater  disability  was  clearly 
supported,  in  two  ways.  First,  13  GRS  items  differentiated  between 
geriatric  and  non-geriatric  inpatients,  all  in  the  expected  direction, 
at  the  .05  level  or  better.  It  is  of  interest  that  the  ADL  factor  of 
the  GRS  (Smith,  Bright,  & McCloskey,  1977)  was  the  most  heavily  repre- 
sented factor  among  these  13  items.  Second,  the  mean  GRS  scores  of  a 
larger  sample  of  207  geriatric  patients  and  36  non-geriatric  patients 
showed  highly  significant  age  differences,  also  in  the  expected 
direction.  The  same  conclusion  can  be  drawn  based  on  data  from  the  85 
male  participants  only. 

Comparison  two.  The  second  comparison  examined  the  scales1 
sensitivity  to  types  of  illness  among  elderly  inpatients.  Groups  I, 
III,  and  IV  were  compared  using  data  from  the  five  main  scales.  Based 
on  the  work  of  Dastoor  et  al.  (1975),  it  was  predicted  that  Group  I 
(organic)  subjects  would  display  more  disability  than  would  Group  IV 
(functionally  impaired  psychiatric)  subjects.  Dastoor  et  al.  (1975) 
found  that,  among  80  newly  admitted  psychogeriatric  patients, 
organically  impaired  patients  received  significantly  worse  scores  on 
both  the  GRS  and  the  MSBS  than  did  functionally  impaired  patients. 
There  is  apparently  no  precedent  for  including  a general  medical 
infirmity  group  in  such  a comparison,  and  the  great  variability  of 
medical  disorders  which  was  included  in  this  sample  made  prediction  of 
their  relative  ADL  competence  difficult. 
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It  could  be  argued  that  research  comparing  psychogeriatric 
patients  with  and  without  organic  impairment  (e.g. , Dastoor  et  al., 
1975)  is  not  the  same  as  research  comparing  elderly  neurological 
patients  and  elderly  functional  psychiatric  patients  (as  in  the 
present  study).  It  is  probably  true,  however,  that  the  latter  method 
yields  both  (a)  purer  samples  of  the  target  populations,  and  (b)  data 
which  are  readily  general izable  to  more  integrated  populations. 

Compari son  three.  A broader,  but  less  well  controlled  assessment 
of  the  impact  of  neurological  impairment  on  ADL  skills  was  also  per- 
formed, by  comparing  Groups  I and  II  (elderly  and  non-elderly 
neurological  subjects)  against  Groups  III  through  V,  using  data  from 
the  five  main  scales.  It  must  be  recognized,  however,  that  age,  in 
addition  to  organicity,  was  unevenly  distributed  across  the 

comparison,  and  the  latter  cluster  (Groups  III  through  V)  was  highly 
heterogeneous.  Naturally,  it  was  to  be  predicted  that  the 
neurol ogical ly  impaired  subjects  would  fare  worse  than  the  otherwise 
impaired  and  the  healthy  subjects. 

Compari son  four.  Finally,  Groups  I,  III,  and  IV  together  were 
compared  with  Group  V to  assess  the  scales'  sensitivity  to  inpatient 
versus  non-patient  status  among  elderly  men.  The  direction  of 

prediction  clearly  favored  the  latter  group.  This  analysis  was 

similar  to  that  reported  by  Lawton  (1971),  which  found  different  mean 
PSMS  scores  between  samples  of  old  age  home  applicants,  psychogeriat- 
ric admissions,  and  protective  custody  patients.  The  present  analysis 
additionally  provided  tests  of  significance. 

Expectati ons  of  Scale  Performance 
The  capacity  of  these  scales  to  yield  data  which  accurately 
reflect  differences  along  meaningful  subject  variables  constitutes  an 
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important  dimension  along  which  the  scales  can  be  ordered  and  evalu- 
ated. The  relative  discriminative  power  of  the  scales  can  be  evalu- 
ated on  the  basis  of  the  results  of  each  of  the  above  comparisons. 
When  other  factors  (e.g.,  reliability,  length)  are  approximately 
equal,  greater  discriminative  ability  would  warrant  one  scale  being 
recommended  over  another.  The  very  narrow  range  of  both  previous 
reliability  estimates  (.87  to  .94),  and  present  reliability  estimates 
(.85  to  .98),  and  the  similarity  of  scale  content,  make  predictions  of 
the  scales'  relative  strengths  very  difficult.  It  was  reasonable  to 
expect,  however,  that  those  scales  which  had  reportedly  discriminated 
between  certain  subject  categories  in  the  past  would  replicate  these 
discriminations  between  comparable  categories  in  the  present  study. 
Furthermore,  the  fact  that  a given  scale  had  already  successfully 
discriminated  between  certain  groups  once  made  it  plausible  to  predict 
that  the  scale  would,  in  fact,  discriminate  between  those  same 
categories  in  this  study  better  than  the  other,  unproven  scales. 

Therefore,  since  the  GRS  was  the  only  scale  in  this  study  to  have 
already  once  discriminated  between  elderly  and  nonelderly  mental 
patients  (Plutchik  et  al.,  1970),  it  was  predicted  that  the  GRS  would 
yield  the  most  discriminative  data  between  Groups  I and  II  (Comparison 
One).  Similarly,  the  GRS  and  the  MSBS,  having  already  been  found  to 
differentiate  between  organically  and  functionally  impaired  psycho- 
geriatric  patients  (Dastoor  et  al . , 1975),  were  predicted  to  be  the 
most  effective  discriminators  between  subject  Groups  I,  III,  and  IV 
(Comparison  Two).  The  existence  of  rough  norms  on  the  PSMS  (Lawton, 
1971)  for  groups  which  probably  differ  significantly  along  the 
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dimension  of  organic  impairment  suggested  that  this  scale  would  trail 
the  GRS  and  MSBS  in  discriminative  power,  but  exceed  the  remaining 
scales.  Finally,  with  the  extension  of  the  organic/nonorganic 
dimension  across  ages  in  Comparison  Three,  the  same  predictions  of 
scale  effectiveness  would  hold  as  in  Comparison  Two. 


CHAPTER  III 
RESULTS 

Interrater  Rel iabi 1 ity 

As  can  be  seen  in  Table  One,  all  main  and  additional  measures 
proved  reliable  at  a probability  level  of  .0001,  and  all  but  one  (GRS 
2)  well  exceeded  the  cutoff  for  further  study  of  r=.8.  Of  the  five 
main  scales  (RDRS,  PSMS,  GRS  3,  MSBS,  & PADL) , the  PADL  proved  the 
most  reliable  (.985),  followed  by  MSBS  (.963),  GRS  3 (.924),  PSMS 
(.895),  and  RDRS  (.853). 

Construct  Val  idit.y 

Table  Two  shows  the  group  means  and  standard  deviations  (SD)  for 
the  entire  sample  (N=140).  Close  examination  reveals  considerable 
variability  in  the  SD  across  samples  with  several  tests,  especially 
the  PADL.  In  most  cases,  this  appears  attributable  to  a ceiling 
effect,  whereby  many  subjects  passed  all  items,  thus  limiting  the 
variance.  The  possibility  is  thus  raised  that  the  present  data  set 
may  violate  the  assumption  of  later  used  multivariate  statistical 
procedures  that  the  sample  variance-covariance  matrices  are  equal. 
Sophisticated  statistical  programs  to  compare  these  matrices  do  exist, 
but  were  not  run  with  these  data.  The  assumption  of  equality  is 
generally  considered  to  be  quite  robust,  and  the  effect  or  meaning  of 
its  violation  in  a multivariate  context  is  not  well  understood.* 

*See  footnote  p.  36. 
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INTERRATER  RELIABILITY 
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Nevertheless,  possible  limitations  on  the  interpretability  of 
statistical  results  obtained  cannot  yet  be  ruled  out. 

Table  Three  shows  the  intercorrelations  of  all  main  scales  and 
additional  measures  used.  Every  correlation  is  in  the  expected 
direction,  with  increased  impairment  on  one  scale  reflecting  increased 
impairment  on  all  others,  regardless  of  scoring  direction  (PSMS,  MSBS, 
& PADL  data  correlated  positively  with  ability;  RDRS  & GRS  3 data 
correlate  positively  with  impairment).  Every  correlation  between 
reliable  scales  is  significant  at  p=.0001.  The  correlations  range 
from  .498  to  .893. 

Table  Four  shows  the  results  of  a principal  components  analysis 
of  the  main  scales.  Using  the  customary  requirement  of  eigenvalue 
equal  to  or  greater  than  one,  the  analysis  yielded  only  one  interpre- 
table factor,  which  accounts  for  79.1%  of  the  overall  variance,  and 
therefore  should  clearly  be  labeled  ADL.  The  scale  loadings  on  this 
ADL  factor  can  be  interpreted  as  indicators  of  each  scale's  overall 
agreement  with  the  other  scales.  Furthermore,  these  loadings  can  be 
used  to  rank  the  scales  according  to  the  extent  of  agreement.  GRS  3 
ranks  first  in  this  regard,  with  a factor  loading  of  .942,  followed  by 
RDRS  (.928),  PADL  (.907),  PSMS  (.868),  and  MSBS  (.793).  The 
relatively  weak  agreement  of  the  MSBS  with  other  scales  was  expected 
(see  METHOD),  and  is  treated  in  more  detail  below  (see  DISCUSSION). 

Criterion-Related  Val idity 

Comparison  One 

The  results  of  this  comparison  are  shown  in  Table  Five.  Every 
scale  reveals  greater  impairment  among  elderly  than  non-elderly 
neurology  inpatients.  Taken  together,  however,  the  five  main  scales 
narrowly  failed  to  discriminate  between  elderly  and  non-elderly 


CONVERGENT  CONSTRUCT  VALIDITY 
CORRELATION  MATRIX 
N = 140 
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TABLE  FIVE 

CRITERION  RELATED  VALIDITY:  COMPARISON  ONE 
MULTIVARIATE  RESULTS3 
F PROBABILITY 
2.32  .0549 

UNIVARIATE  RESULTS 


F PROBABILITY 


RDRS 

5.75 

.0197 

PSMS 

5.05 

.0285 

MAIN 

GRS3 

9.13 

.0037 

SCALES 

MSBS 

2.16 

.1467 

PADL 

4.38 

.0407 

PSMSX 

7.57 

.0079 

ADDITIONAL 

MSQ 

6.80 

.0116 

SCALES 

GRS1 

10.52 

.0020 

GRS2 

3.60 

.0628 

a 

Variables:  RDRS, 

PSMS,  GRS3, 

MSBS,  & PADL 
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neurology  inpatients  at  alpha=.05  (multivariate  probability^ 0549). 
Examination  of  univariate  results  reveals  that  this  failure  was  due  to 
the  MSBS,  which  yielded  group  differences  with  a random  probability  of 
.1467.  The  other  four  scales  each  discriminated  significantly  between 
the  two  age  groups. 

Such  a phenomenon  (i.e.,  multivariate  insignificance  despite 
strong  univariate  results)  is  neither  unusual  nor  discrepant.  At 
least  two  factors  may  have  contributed  to  it  occurring  in  this  case. 
First,  a slightly  larger  sample  would  have  yielded  more  degrees  of 
freedom,  and,  therefore,  probably,  multivariate  significance.  Second, 
because  of  the  especially  high  intercorrelations  of  the  four  "pure" 
ADL  scales  (see  Table  Three),  they  actually  accounted  for  little  more 
total  variance  than  just  one  of  them  alone  would  have.  The  inclusion 
of  all  four,  however,  nevertheless  cost  degrees  of  freedom  in  the 
multivariate  analysis,  decreasing  its  power.  In  other  words,  the 
inclusion  of  highly  redundant  measures  created  a situation  of  rela- 
tively high  covariance  and  low  variance,  leading  predictably  toward 
multivariate  insignificance. * 

Comparison  Two 

As  Table  Six  shows,  highly  significant  overall  differences  were 
found  between  elderly  inpatients  on  three  hospital  services 


*The  following  individuals  were  consulted  with  regard  to  this  and 
other  statistical  issues  in  this  study:  John  Overall,  Ph.D., 

Department  of  Psychiatry  and  Behavioral  Sciences,  University  of  Texas 
Medical  Center,  Houston,  Texas;  Jerry  Lester,  Ph.D.,  Institute  for 
Computor  Services  and  Applications,  Rice  University,  Houston,  Texas; 
Ron  Marks,  Ph.D.,  Department  of  Biostatistics,  University  of  Florida, 
Gainesville,  Florida. 
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TABLE  SIX 

CRITERION  RELATED  VALIDITY:  COMPARISON  TWO 
MULTIVARIATE  RESULTS3 
F PROBABILITY 
4.58  .0001 


UNIVARIATE  RESULTS 


RDRS 

F 

5.84 

PROBABILITY 

.0043 

PSMS 

13.56 

.0001 

MAIN 

GRS3 

12.09 

.0001 

SCALES 

MSBS 

3.70 

.0293 

PADL 

8.09 

.0006 

PSMSX 

13.00 

.0001 

ADDITIONAL 

MSQ 

9.66 

.0002 

SCALES 

GRS1 

4.18 

.0190 

GRS2 

2.12 

.1274 

Variables:  RDRS,  PSMS,  GRS3,  MSBS,  & PADL 


38 


(neurology,  medicine,  and  psychiatry)  using  data  from  the  five  main 
scales  together  and  from  each  scale  individually.  With  each  scale, 
neurological  patients  proved  more  imparied  than  psychiatric  patients. 
In  every  case,  furthermore,  the  mean  impairment  level  of  medical 
patients  fell  somewhere  between  neurological  and  psychiatric  means. 
Table  Seven  shows  the  results  of  LSD  multiple  comparisons  performed  on 
the  group  means  yielded  by  each  scale.  In  general,  the  significant 
overall  statistics  (Table  Six)  appear  to  reflect  the  gap  between  the 
high  impairment  level  of  the  elderly  neurology  paitents  on  the  one 
hand,  and  the  lower  impairment  level  of  the  medical  and  psychiatric 
patients  on  the  other. 

Comparison  Three 

Highly  significant  differences  between  neurological ly  impaired 
subjects  and  subjects  without  neurological  impairment  were  found  using 
data  from  all  main  scales  together  and  singly,  as  shown  in  Table 
Eight.  Neurology  patients  showed  significantly  more  deficit  than  the 
other  groups.  In  order  to  determine  the  sources  of  these  findings  of 
overall  differences,  another  LSD  was  performed,  this  time  with  the 
data  from  each  scale  on  the  entire  sample  (N=140),  the  results  of 
which  are  shown  in  Table  Nine.  The  main  difference  to  emerge  appears 
to  lie  between  elderly  neurology  patients  on  the  one  hand,  and 
psychiatric  patients  and  nonhospital i zed  subjects  on  the  other.  With 
every  scale,  non-el derly  neurology  inpatients  and  elderly  medical 
inpatients  fall  between  these  extremes  and  do  not  differ  significantly 
from  one  another. 

Compari son  Four 

Highly  significant  differences  between  hospitalized  and  non- 
hospital ized  subjects  were  found  using  all  main  scales  together  and 


A POSTERIORI  MULTIPLE  COMPARISONS  OF  GROUPS  I,  III,  AND  IVa 

GROUP  I GROUP  III  GROUP  IV 
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TABLE  EIGHT 

CRITERION  RELATED  VALIDITY:  COMPARISON  THREE 
MULTIVARIATE  RESULTS3 
F PROBABILITY 
5.97  .0001 


UNIVARIATE  RESULTS 

F PROBABILITY 


RDRS 

11.79 

.0008 

PSMS 

23.23 

.0001 

MAIN 

GRS3 

21.17 

.0001 

SCALES 

MSBS 

8.81 

.0035 

PADL 

17.18 

.0001 

PSMSX 

24.56 

.0001 

MSQ 

18.81 

.0001 

ADDITIONAL 

GRS1 

9.71 

.0022 

SCALES 

GRS2 

3.50 

.0634 

Variables:  RDRS,  RSMS,  GRS3,  MSBS,  & PADL 


A POSTERIORI  MULTIPLE  COMPARISONS  OF  ALL  SUBJECT  SAMPLES 
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4 = Psychiatry,  5 = Non- inpatient 

Lines  connect  groups  whose  means  do  not  differ  significantly  by  LSD 
at  alpha  = .05  (straight  lines)  and  at  alpha  = .01  (wavy  lines). 
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alone  (Table  Ten).  In  every  case,  the  direction  of  the  comparison  was 
for  inpatients  to  show  more  impairment  than  non-hospital ized  subjects. 
Again  using  the  full-sample  LSD  results  shown  in  Table  Nine,  the 
significant  differences  obtained  seem  to  exist  primarily  between  the 
non-hospi tal i zed  subjects  on  the  one  hand  and  the  neurology 
(especially  the  elderly)  and  medicine  patients  on  the  other.  None  of 
the  five  scales  yielded  significant  differences  between  the 
psychiatric  sample  mean  and  the  non-hospital ized  sample  mean. 

Relative  Discriminative  Power  of  the  Scales 

In  Comparison  One,  the  GRS  3 discriminated  most  powerfully 
( F=9 . 13 , p=. 0037) , followed  by  the  RDRS  (F=5.75,  p=.0197),  the  PSMS 
(F=5. 05,  p=. 0285) , and  the  PADL  (F=4.38,  p=.0407).  Only  the  MSBS 
(F=2.16,  p=.1467)  failed  to  discriminate  significantly  between  the  old 
and  young  neurology  patients. 

In  Comparison  Two,  of  the  five  main  scales,  only  the  PSMS  yielded 
data  which  distinguished  significantly  between  medical  and  psychiatric 
patients,  and  in  fact  maintained  this  significant  difference  at 
alpha=.01.  The  PSMS  thus  appears  to  be  the  most  powerful  discrimina- 
tor between  these  groups.  GRS  3 is  the  next  most  powerful  because  it 
was  the  only  other  scale  which  maintained  a significant  difference 
between  two  adjacent  means  at  alpha=.01.  RDRS  and  PADL,  tied  for 
third,  failed  to  do  so,  and  last-ranked  MSBS  distinguished 
significantly  only  between  extreme  means  (neurology  and  psychiatry 
samples)  even  at  alpha=.05. 

In  Comparison  Three,  as  in  Comparison  Two,  the  PSMS  proved  the 
most  powerful  discriminator,  in  terms  of  univariate  F-value,  although 
the  difference  between  the  PSMS,  GRS  3,  and  PADL  may  be  psychological- 
ly trivial,  given  that  they  all  yielded  results  significant  at  p=.0001. 
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TABLE  TEN 

CRITERION  RELATED  VALIDITY:  COMPARISON  FOUR 
MULTIVARIATE  RESULTS3 
F PROBABILITY 

4.49  .0010 

UNIVARIATE  RESULTS 

F PROBABILITY 


RDRS 

12.48 

.0006 

PSMS 

20.32 

.0001 

MAIN 

GRS3 

10.33 

.0017 

SCALES 

MSBS 

7.95 

.0057 

PADL 

9.74 

.0023 

ADDITIONAL 


PSMSX 

MSQ 

GRS1 

GRS2 


13.78 

12.92 

41.68 

2.49 


.0003 

.0005 

.0001 

.1177 


Variables:  RDRS,  PSMS,  GRS3,  MSBS,  & PADL 


SCALES 
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The  weakness  of  the  RDRS  and  MSBS  is  only  relative  in  this  case,  as 
they,  too,  yielded  clearly  significant  results. 

In  Comparison  Four,  the  scales  can  be  ranked  according  to  dis- 
criminative power,  as  expressed  in  F-values,  as  follows:  PSMS 

(F=20. 32,  p=. 0001) , RDRS  (F=12.48,  p=.0006),  GRS  3 (F=10.33,  p=.0017), 
PADL  (F=9. 74,  p=.0023),  and  MSBS  (F=7.95,  p=.0057). 

Table  Eleven  shows  the  overall  hit  rate  for  each  scale  in 
distinguishing  between  elderly  neurological  patients  and  elderly 
psychiatric  patients.  For  each  scale,  an  optimal  cutting  score  was 
identified,  and  a Bayseian  analysis  conducted.  The  highest  hit  rate 
was  demonstrated  by  the  PADL  (80.85%),  followed  by  the  PSMS  (78.4%), 
GRS  3 (77.5%),  RDRS  (73.3%),  and  MSBS  (65.0%). 


CLASSIFICATION  HIT  RATES:  GROUPS  I AND  IV 
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^Adjusted  for  unequal  sample  sizes:  NEURO  N=30,  PSYCH  N=20. 


CHAPTER  IV 
DISCUSSION 

Review  of  Findings 

This  section  will  first  summarize  the  findings  of  the  present 
study  concerning  the  interrater  reliability,  and  construct  and 
criterion-related  validities  of  the  five  main  scales.  A treatment  of 
each  scale  in  light  of  these  findings  will  then  follow  a discussion  of 
psychometric  issues  of  relevance. 

The  interrater  reliability  of  the  main  scales  proved  uniformly 
high,  with  even  the  lowest  realibility  (MSBS,  r=.853)  still  falling  in 
a range  that  is  quite  acceptable  for  research  or  assessment  purposes. 
These  results  closely  approximate  the  interrater  reliability  estimates 
obtained  by  the  scales'  authors.  The  repeated  demonstration  of  these 
scales'  interrater  reliability  suggests  that  data  generated  by  any 
single  rater  will  agree  substantially  with  data  generated  by  other 
raters.  It  should  be  cautioned,  however,  that  this  assumption  may 
only  be  a safe  one  if  all  raters  who  use  the  scale  are  trained  in  a 
manner  and  with  a rigor  comparable  to  that  given  research  raters. 
Still,  in  general,  the  interrater  reliability  results  obtained  here 
constitute,  by  replicating  earlier  estimates,  a strong  basis  from 
which  validation  work  can  be  justifiably  pursued. 

The  construct  validity  data  provide  convincing  evidence  that 
these  scales,  in  particular  the  RDRS,  PSMS,  GRS  3,  and  PADL,  indeed 
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measure  the  same  domain  of  behavior.  The  large  portion  of  total 
variance  explained  by  the  ADL  principal  component  (79%)  is  especially 
impressive.  Although  the  MSBS  also  correlates  with  these  scales  to  a 
high  degree  of  significance,  the  data  confirm  the  prediction  that  the 
MSBS  would  have  lower  intercorrelations  than  the  other  scales.  This 
is  presumably  due  to  the  focus  of  the  MSBS  on  social  behavior  rather 
than  basic  self-care  skills.  The  significant  agreement  obtained, 
however,  strongly  suggests  an  interactive  maintenance  or  decline  of 
those  aspects  of  these  two  behavioral  domains  (social  and  ADL)  which 
are  actually  distinct  from  one  another.  It  is  certainly  plausible,  in 
other  words,  that  maintaining  oneself  day  to  day  often  depends  heavily 
upon,  as  well  as  contributes  materially  to,  maintaining  smooth  social 
functioning. 

The  results  of  Comparison  One  indicate  that,  as  hypothesized, 
older  neurological  patients  show  significantly  greater  impairment  of 
ADL  capacity  than  do  younger  patients.  However,  social  behavior  data 
reveal  only  a nonsignificant  trend  for  older  patients  to  be  more 
impaired  than  younger  patients.  Therefore,  advanced  age  or  the 
particular  neuropathological  conditions  that  tend  to  accompany  it 
appear  to  have  a much  clearer  deleterious  effect  on  these  patients' 
basic  self-care  skills  than  on  their  social  behavior. 

Comparison  Two  was  a replication  and  extension  of  previous 
comparisons  of  ADL  skill  levels  of  neurologically  and  functionally 
impaired  elderly  inpatients.  The  GRS  3 and  MSBS  indeed  replicated 
their  earlier  discriminations  between  these  groups,  as  was  hypothe- 
sized. Every  other  scale  did,  too,  however.  The  medical  sample 
consistently  ranked  between  the  neurological  and  functional  groups, 
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indicating  that  among  the  elderly,  neurological  pathology  tends  to  be 
more  devastating  to  basic  functioning  than  does  visceral  pathology 
without  primary  neurological  involvement.  The  inability  of  all  scales 
except  the  PSMS  to  discriminate  significantly  between  medical  and 
psychiatric  samples  indicates  that  the  PSMS  can  be  more  sensitive  than 
the  other  scales  to  fine  differences  in  ADL  skill  level. 

As  hypothesized,  the  results  of  Comparison  Three  show  that,  even 
when  the  elderly  neurology  patients  are  clustered  with  their  generally 
less  impaired,  younger  counterparts,  the  effect  of  neuropathology  is 
still,  as  found  in  Comparison  Two,  greater  impairment  of  ADL  and  basic 
social  skills  than  is  found  among  subjects  without  neuropathology.  As 
mentioned,  this  is  a poorly  controlled  contrast,  partially  confounding 
hospital  status  with  neurological  status.  It  is  of  note  that  no  main 
scale  could  draw  a significant  distinction  between  the  psychiatric  and 
nonhospital i zed  samples,  which  together  formed  the  primary  contrast  to 
the  highly  impaired  elderly  neurology  patients. 

The  prediction  that  hospitalized  elderly,  as  a group,  would  show 
greater  impairment  than  non-hospital i zed  elderly  of  the  behavioral 
skills  measured  was  confirmed  by  the  outcome  of  Comparison  Four.  This 
appears  to  hold  equally  for  ADL  and  basic  social  skills.  Again,  these 
significant  differences  emerged  despite  the  essentially  normal  profile 
of  the  psychiatric  sample. 

In  sum,  with  one  exception  (MSBS  is  Comparison  One),  these  results 
allow  every  scale  evaluated  to  claim  substantial  interrater  reliability 
and  construct  validity.  Also,  criterion-related  validity  for  several 
populations  has  been  demonstrated  along  the  following  dimensions:  (1) 

age  among  male  neurology  inpatients,  (2)  type  of  medical  impairment 
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among  elderly  male  inpatients,  within  broad  catagories,  (3)  presence 
of  neurological  pathology  among  males,  and  (4)  inpatient  status  among 
elderly  males. 

Limitations  and  Bi ases  of  the  Study 

In  evaluating  the  internal  validity  of  this  study,  three  factors 
which  proved  especially  problematic  require  discussion.  These  are  (1) 
recruitment  procedures,  (2)  selection  and  preparation  of  raters  for 
subject-absent  scales,  and  (3)  race  of  subjects.  The  primary  limita- 
tion on  this  study's  external  validity  will  then  be  discussed. 

(1)  Recruitment  of  inpatients  suffered  perhaps  most  severely 
from  variability  in  the  promptness  and  diligence  with  which  eligible 
patients  were  approached  for  participation.  Although  exhaustive  and 
clear  data  on  this  do  not  exist,  it  is  likely  that  many  eligible 
patients  were  lost  as  participants  due  to  tardy  recuitment  or  testing. 
This  was  particularly  evident  on  the  medical  wards,  where  patient 
turnover  is  relatively  quick.  If  these  patients  with  shorter  hos- 
pitalizations were  thus  less  likely  to  participate,  the  data  might 
reflect  a bias  toward  greater  impairment. 

Among  inpatients,  medical  patients  were  also  the  most  likely  to 
refuse  participation  for  lack  of  interest.  These  refusers  were  often 
noted,  anecdotally,  to  be  fairly  spry,  intact,  and  sometimes  cynical 
individuals  who  probably  would  have  fared  well  on  the  scales. 
Differential  subject  interest  in  the  project  may  also,  therefore,  have 
lent  the  data  a bias  toward  greater  impairment  of  medical  subjects. 
This  bias  risk  is  moderated  somewhat  by  the  fact  that  a few  uninter- 
ested refusers  did  so  because  of  significant  pain.  One  such  patient 
died  two  days  after  refusing. 
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Recruitment  of  nonhospital ized  subjects  posed  possibly  the  most 
serious  methodological  difficulty  of  the  entire  study.  In  contrast  to 
the  inpatient  recruitment  baises  discussed  above,  it  is  probable  that 
the  nonhospital ized  sample  data  are  significantly  biased  toward  more 
intact  ADL  skills.  Since  all  140  subjects  were  administered  the  MSBS 
and  PADL  at  GVAH,  no  housebound  community  residents  participated. 
Since  early  plans  to  recruit  nonhospital ized  veterans  through  veterans 
organizations  such  as  the  Veterans  of  Foreign  Wars  (VFW)  proved 
totally  untenable,  a variety  of  more  accessible  subject  pools  were 
tapped.  The  largest  portion  of  the  final  sample  was  recruited  from 
among  the  retired  men  who  do  volunteer  work  at  GVAH.  Their  voluntary 
duties  consist  primarily  of  escorting  patients  and  carrying  hospital 
paperwork  between  the  various  wards  and  clinics  in  GVAH.  These  men 
are  uniformly  alert,  ambulatory,  and  fit.  A large  group  of  less 
healthy  participants  was  recruited  from  among  veterans  making  out- 
patients visits  to  the  GVAH  evaluation  or  specialty  clinics.  The 
uncomfortably  arbitrary,  rather  than  systematic,  nature  of  the  sampling 
procedure  used  to  recruit  nonhospital ized  subjects  makes  it  very 
difficult  to  call  the  sample  data  representative  of  elderly  nonhos- 
pital ized  male  veterans  in  general.  It  is  strongly  recommended  that 
future  studies  comparing  these  scales  across  the  dimension  of 
hospitalization  seriously  attempt,  within  practical  limits,  to  obtain 
representative  data  through  systematic  sampling.  This  will  probably 
require  the  cooperation  of  the  VA  itself,  or  of  viable  local  units  of 
the  VFW  or  American  Legion,  and  the  use  of  their  membership  roles. 

(2)  Less  than  ideal  circumstances  also  prevailed  in  the  utiliza- 
tion of  raters  for  the  three  subject-absent  scales.  Except  in  rare 
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cases  of  unusual  rater  availability,  two  or  three  of  the  scales  were 
often  completed  by  the  same  rater  for  each  subject  in  any  group.  It 
is  possible,  as  a result,  that  the  intercorrelations  of  these  three 
scales  (see  Table  Three)  are  spuriously  high.  The  risk  of  such  a 
rater  effect  contaminating  the  construct  validity  results  is  probably 
limited  by  the  excellent  interrater  reliability  of  these  scales. 
Nevertheless,  the  possibility  of  such  contamination  cannot  be  ruled 
out  without  statistically  comparing  the  data  yielded  by  various 
raters.  However,  the  distribution  across  subject  groups  of  the 
various  raters,  both  as  individuals  and  as  groups  (nurses,  residents, 
medical  students,  etc.),  was  unplanned  and  based  largely  on  immediate 
exigencies,  and  therefore  extremely  unbalanced.  This  would  unfor- 
tunately  tend  to  make  such  statistical  comparisons  quite  difficult. 

An  additional  source  of  poorly  controlled  variance  is  probably 
present  in  the  subject- absent  data  on  community  subjects.  Whereas  all 
raters  of  inpatients  were  professional  or  paraprofessional  hospital 
staff  members  and  received  personal  orientations  and  instructions  in 
the  purpose  and  use  of  the  scales,  this  was  not  always  the  case  with 
raters  of  nonhospital ized  subjects.  With  perhaps  half  of  these  sub- 
jects, the  scales  were  explained  and  handed  to  the  subject,  with 
careful  instructions  to  pass  both  the  scales  and  the  explanation  of 
their  use  onto  their  rater  (usually  spouse  or  child).  The  forms  were 
then  returned  to  the  researcher  in  pre-addressed,  postage-paid 
envelopes.  Although  practically  all  scales  were  returned  promptly, 
appropriately  completed,  and  bearing  the  raters'  signatures,  the 
possibilities  of  confusion  and  non-independent  rating  decisions  cannot 
be  di sconfirmed.  Clearly,  direct  rater  training  and  assurance  of 
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independent  rating  would  have  been  desirable.  Perhaps  the  most 
important  ramification  of  the  procedure  used  is  the  possibility  that 
statistical  differences  obtained  between  Group  V and  the  other  subject 
groups  cannot  be  interpreted  as  reflecting  pure  subject-related 
variance.  Rather,  methodological  variance  may  also  be  reflected. 

(3)  As  Table  Two  shows,  there  is  significant  variability  between 
samples  in  the  rate  of  black  patient  participation  in  the  study.  The 
non-elderly  neurology  sample  was  23.3%  black,  the  elderly  neurology 
and  medicine  samples  were  each  16.7%  black,  and  the  psychiatry  and 
non-hospi tal ized  samples  included  no  blacks  at  all.  It  is  meaningful 
to  examine  both  (a)  the  comparability  of  these  percentages  with 
system-wide  VAH  inpatient  base  rates,  and  (b)  the  possible  causes  and 
meanings  of  this  inter-sample  variability. 

Racial  percentage  information*  on  VAH  inpatients  is  extracted 
from  two  parameters  nationwide:  bed  census  on  a "typical  day,"  and 
hospital  discharges.  These  figures  combine  data  from  general  medical 
hospitals  and  long-term  care  facilities,  and  are  not  broken  down  by 
age.  On  a "typical  day"  (the  last  Wednesday  in  September,  1979),  out 
of  68,689  VAH  beds  nationwide,  77.2%  were  occupied  by  whites,  and 
15.8%  by  blacks.  During  fiscal  year,  1979,  of  903,466  discharges, 
77.9%  were  white,  and  16.6%  were  black.  The  percentages  obtained  in 
this  study  with  medicine  and  elderly  neurology  samples  are  highly 


^Source  of  VA  system  information:  Phone  communication,  7/30/80,  Dr. 

William  Page,  Chief,  Division  of  Biometrics,  VA  Central  Office, 
Washington,  D.C.,  202-389-3458. 
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consistent  with  these  figures,  while  the  non-elderly  neurology  sample 
was  somewhat  above,  and  the  remaining  samples  significantly  below 
these  figures. 

During  the  period  of  data  collection  for  this  study,  every  elder- 
ly black  patient  on  the  GVAH  psychiatric  wards  carried  a diagnosis  of 
organic  impairment,  making  them  ineligible  for  participation.  The 
all-white  psychiatric  sample  obtained,  therefore,  is  clearly  not  a 
product  of  recruitment  or  consent  bias,  but  more  likely  of  institu- 
tional or  sociomedical  selection  factors  such  as  diagnostic  biases,  or 
racial  differences  in  hospital  use  patterns.  There  may  also  be  racial 
differences  in  early-  or  late-life  exposure  to  situations  or  use  of 
substances  which  compromise  central  nervous  system  integrity. 

In  the  community  sample,  however,  the  possiblity  of  an  insidious, 
though  unintended,  racial  bias  in  the  recruitment  of  subjects  is 
distinctly  greater.  Although  there  are  few  or  no  blacks  among  the 
volunteer  force  at  GVAH,  black  veterans  are  surely  well  represented 
among  clinic  outpatients.  The  unsystematic,  somewhat  opportunistic 
recruitment  procedure  used  to  select  this  sample  may  well  have  allowed 
a serious  bias  against  black  participation  to  develop. 

The  impact  of  the  particular  pattern  of  black  participation 
obtained  on  the  study's  test-related  conclusions,  and  on  the  general- 
izability  of  the  results  to  black  populations,  can  be  assessed  by 
studying  the  tests'  correlations  with  race.  Out  of  fifteen  corre- 
lations (five  tests  by  three  groups  with  black  subjects),  only  one  was 
significant  (PSMS,  Group  I,  r=.4376,  p=.015),  which  is  about  what 
might  be  expected  by  chance  at  the  stated  alpha  level  of  .05.  When 
all  140  subjects  are  pooled,  the  greatest  test  by  race  correlation  is 
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still  nonsignificant  (MSBS,  r=-.081,  p=.338).  While  certain  method- 
ological flaws  are  apparent,  therefore,  it  still  appears  that  race 
accounts  for  very  little  test  variance,  and  that  the  present  results 
may  be  general izable  to  black  as  well  as  white  populations. 

Based  on  the  findings  of  Smith,  Bright,  and  McClosky  (1977)  that 
women  in  a large  study  showed  significantly  greater  impairment  of  ADL 
skills  than  men,  the  results  of  the  present  study  cannot  be  general- 
ized to  female  populations.  It  would  be  inappropriate  to  treat  the 
results  as  valid  for  female  populations. 

In  addition,  it  must  be  recognized  that  the  results  of  this  and 
earlier  studies  of  these  scales  cannot  be  immediately  generalized  to 
other  populations  of  male  subjects.  To  attempt  such  generalizations 
without  first  replicating  earlier  findings  with  each  new  population 
would  be  cavalier  and  unscientific. 

Choosing  a Scale 

Despite  the  limitations  and  biases  discussed  above,  the  results 
of  this  study  and  previous  research  indicate  that  these  five  scales 
are  psychometrically  sound  in  several  important  respects.  They  are 
all  highly  reliable  between  raters,  they  all  show  high  agreement  among 
themselves,  and  they  all  (with  one  exception)  proved  very  sensitive  to 
functional  competence  variability  associated  with  selected  relevant 
criteria.  None  failed  to  replicate  earlier  reported  specific  dis- 
criminations between  groups.  Consumers  of  ADL  scales  are  therefore 
placed  in  the  enviable  position  of  having  to  choose  the  most  appropri- 
ate measure  from  among  a group  of  highly  qualified  instruments.  They 
have  the  luxury  of  making  choices  on  the  basis  of  various  clinical, 
pragmatic,  and  situation-specific  factors  without  sacrificing  psycho- 
metric quality. 
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Subject-Absent  vs.  Subject-Present  Seal es 

One  decision  a consumer  of  ADL  scales  may  have  to  make  first  in 
choosing  a scale  is  between  subject-absent  and  subject-present  instru- 
ments. This  decision  may  have  several  elements.  First,  the  present 
study  demonstrated  that  both  types  can  have  excellent  interrater 
reliability,  criterion-related  validity,  and  correct  classification 
rate.  However,  the  best  interrater  reliability  and  lowest  misclassi- 
fication  rate  were  obtained  by  a subject-present  scale  (PADL),  whereas 
the  subject-absent  scales  generally  dsicriminated  best  among  groups. 
Second,  logistical  considerations  point  out  important  uniquenesses  of 
the  two  methods.  Use  of  subject-absent  scales  assumes  rater  familiar- 
ity with  subjects,  whereas  subject-present  measures  do  not. 
Familarity,  it  was  assumed  in  the  present  study,  takes  at  least  a week 
to  develop,  and  can  be  an  untenable  limitation  for  some  assessment 
purposes.  Third,  subject-present  scales  require  carefully  standard- 
ized administration,  and  thus  more  extensive  rater  training  and 
practice  than  do  subject-absent  scales.  On  the  other  hand,  the 
involvement  of  fewer  personnel  may  be  required  when  using  subject- 
present  scales.  Fourth,  proper  administration  of  the  subject-present 
scales  requires  space  and  special  materials,  as  well  as  the  subject's 
time  and  cooperation.  Subject-absent  scales  require  none  of  these, 
and  are  more  portable.  Finally,  whereas  retesting  with  subject-absent 
scales  presents  no  major  difficulties,  retesting  can  be  quite 
problematic  with  subject-present  scales,  unless  a considerable  inter- 
test interval  is  allowed.  Contamination  of  result  general izability 
from  practice  effects  on  specific  test  tasks  is  possible.  Also,  the 
requirement  that  certain  standardized  administration  procedures  appear 
accidental  to  the  subject  (MSBS)  would  be  difficult  to  fulfill  in 
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subsequent  administrations. 

A synthesis  of  the  above  considerations  suggests  the  following 
guidelines  for  people  faced  with  the  choice  between  ADL  assessment 
methods.  Subject-absent  scales  are  probably  the  technology  of  choice 
(1)  whenever  many  indigenous  raters  are  available  and  trainable,  (2) 
when  there  may  be  lack  of  subject  motivation  to  be  assessed,  and  (3) 
especially  when  retesting  of  subjects  is  planned  or  desirable.  This 
availability  of  raters,  often  professional,  will  exist  in  most  insti- 
tutions, especially  long-term  facilities,  where  familiarity  poses  no 
problem.  Subject-present  scales,  on  the  other  hand,  seem  most 
appropriate  for  research  and  screening  purposes  involving  one-time 
evaluation  of  subjects.  These  scales  encourage  the  repeated  involve- 
ment of  a very  few  trained  raters  who  need  know  nothing  about  the 
subject  except  what  they  learn  during  testing.  The  lack  or  inaccessi- 
bility of  indigenous,  familiar,  and  trainable  raters  is  no  barrier 
with  these  scales,  but  retesting  may  be. 

Summary  of  Findings  on  Each  Scale 

The  PSMS,  actually  the  briefest  scale  studied,  proved  to  be  the 
best  of  the  three  subject-absent  scales.  It  was  the  strongest  dis- 
criminator of  all  the  scales  on  three  of  four  validity  criteria,  and 
had  the  best  classification  hit  rate  within  its  method  between 
neurological  and  psychiatric  patients.  This  scale,  as  published,  is 
exceedingly  simple,  with  a scoring  procedure  that  ignores  more 
information  than  it  recognizes.  Systematic  comparisons  in  this  study 
of  the  standard  scoring  procedure  (PSMS)  with  a more  comprehensive  one 
(PSMSX,  in  which  proportional  scores  for  partial  competence  on  each 
item  were  given,  rather  than  only  giving  credit  for  full  competence  on 


57 


each  item),  however,  failed  to  yield  noticable  gains  in  psychometric 
strength.  The  brevity,  simplicity,  and  psychometric  quality  of  this 
scale  recommend  it  highly  for  applied  use  and  further  study.  It  is 
clearly  the  scale  of  choice  when  medical  patients  are  to  be  included 
in  comparisons. 

The  GRS  was  described  above  (see  METHOD)  as  having  the  strongest 
base  of  psychometric  support  among  the  scales  chosen  for  study.  It 
was  therefore  predicted  that  its  ADL  factor  would  prove  the  best  in 
the  present  comparisons  of  psychometric  quality.  Although  this  did 
not  prove  to  be  strictly  the  case  (except  in  Comparison  One),  GRS  3 
clearly  emerged  as  a highly  reliable,  useful,  and  powerful  scale.  As 
an  ADL  scale,  it  is  more  than  adequate.  Perhaps  the  greatest  strength 
of  GRS  3 is  that  it  is  unlikely  to  be  used  in  isolation,  and  thus  will 
provide  its  users  with  information  on  the  other  GRS  factors,  With- 
drawal/Apathy and  Antisocial  Disruptive  Behavior.  Data  obtained  in 
the  present  study  indicate  that  GRS  2 needs  a great  deal  of  improve- 
ment before  it  can  stand  on  its  own  psychometrically,  but  this  limita- 
tion is  outside  the  scope  of  this  study,  and  is  only  a relative 
detractor  from  the  considerable  overall  quality  of  the  GRS. 

The  RDRS,  also,  emerged  as  a quite  adequate  scale  of  ADL  in  all 
psychometric  respects,  but  was  generally  the  weakest  of  the  ADL  scales 
studied.  Its  relative  merit  lies,  therefore,  less  in  its  present 
performance  than  in  earlier  reported  empirical  data  which  (1)  demon- 
strated various  criterion-related  validities  with  nursing  home 
patients,  and  (2)  generated  norms  on  a sample  of  1000  male  nursing 
home  residents. 
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The  PADL  demonstrated  the  best  interrater  reliability  and  the 
lowest  miscl assi f ication  rate  between  old  neurological  and  psychiatric 
patients  of  all  scales  studied,  but  trailed  the  subject-absent  scales 
slightly  on  criterion-related  validity  tests.  Its  strengths  coincide 
with  those  of  subject-present  scales  described  in  general  above — it 
can  save  time  in  certain  situations  and  make  measurement  possible  when 
appropriate  raters  are  not  accessible  or  do  not  coexist  naturally  with 
subjects.  Although  special  equipment  is  needed,  it  is  not  expensive, 
and  administration  is  very  quick. 

Although  the  MSBS  was  clearly  the  weakest  of  the  five  scales  with 
regard  to  the  criterion-related  comparisons  and  hit  rate  calculations 
performed  in  this  study,  it  must  be  recognized  as  a respectable 
measure  in  its  own  right  which  failed  to  yield  significant  results  in 
only  one  comparison.  The  relative  weakness  of  its  convergent  validity 
with  obvious  measures  of  ADL  is  consistent  with  its  intended  construct 
content,  and  points  out  only  the  need  for  studies  comparing  it  direct- 
ly with  other  measures  of  minimal  social  behavior.  It  appears  that 
this  construct  may  be  less  sensitive  to  health  status,  the  primary 
focus  of  health  care  delivery  systems,  than  is  ADL.  As  mentioned 
above  (p.  18),  the  MSBS  contains  the  Mental  Status  Questionnarie,  a 
ten-item  index  of  orientation  to  time  and  place,  and  of  memory  for 
typically  overlearned  facts.  Despite  the  significant  similarities 
between  MSBS  data  and  ADL  scale  data,  the  practicality  of  the  MSBS  is 
undermined  by  the  fact  that  the  much  briefer  MSQ  excels  the  MSBS  on 
every  test  in  this  study. 
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Further  Research 

The  most  pressing  research  needs  with  regard  to  the  five  scales 
evaluated  here  are  no  longer  psychometric,  but  rather  research  aimed 
at  testing  and  improving  the  scales'  applicability  to  clinical  tasks, 
such  as  individual  assessment  and  treatment  evaluation.  Perhaps  the 
most  basic  need  is  for  good  normative  data  for  the  scales  on  a wide 
variety  of  subject  populations.  Such  normative  data  would  allow  the 
general izabil ity  of  research  findings  using  the  scales  to  be  properly 
assessed.  Representative  sampling  procedures  should  be  pursued  with 
much  energy  in  generating  these  normative  samples.  The  availability 
of  normative  distribution  data  will  allow  the  placement  of  individual 
subjects  along  a realistic  continuum  of  competence. 

Practical  considerations  such  as  subject  availability  contributed 
to  the  present  study  being  limited  to  men  only.  Much  previous  work 
with  these  scales  (e.g.  , Linn,  et  al.,  1977),  also,  involved  only 
males.  Previous  findings  of  sex  differences  in  ADL  skill  among  the 
elderly  (Smith,  Bright,  & McClosky,  1977)  make  it  important  to  control 
for  sex  in  this  research  area.  Needless  to  say,  however,  the  need  for 
quality  assessment  techniques  is  equally  great  with  female  popula- 
tions. New  norms  should  include  equal  but  separate  female  samples. 
Validity  studies  already  completed  with  male  subjects  should  be  repli- 
cated with  female  subjects. 

Longitudinal  research  to  investigate  the  relative  stability/ 
sensitivity  to  change  of  various  ADL  scales  should  also  be  conducted. 
Such  assessments  of  test-retest  reliability  would  most  informatively 
be  done  using  several  ADL  scales  in  the  same  study,  so  that,  as  in  the 
present  study,  the  data  will  be  directly  comparable.  This  is  the  best 
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approach  to  the  dilemma  of  identifying  true,  underlyi ng  stability  or 
change.  One  clinically  useful  context  in  which  such  data  might  be 
gathered  would  be  a long-term  project  plotting  the  clinical  course  of 
a sample  of  patients  with  a selected  degenerative  neurological 
disease. 

ADL  skill  is  the  behavior  domain  targeted  by  Reality  Orientation 
Therapy.  As  amply  demonstrated  above,  the  research  to  date  has  not 
appropriately  evaluated  the  effectiveness  of  this  treatment.  This 
study  has  demonstrated  that  appropriate  and  high  quality  instruments 
are  available  for  a proper  evaluation  of  ROT  and  any  other  treatment 
designed  to  affect  ADL  competence.  This  research  should  now  proceed. 

The  relationship  of  ADL  competence  to  performance  on  popular 
intellectual  and  neuropsychological  measures  could  be  studied  with 
relative  ease,  simply  by  adding  a sound  ADL  instrument  to  the  neuro- 
psychological battery.  It  is  reasonable  to  predict  that  ADL 
competence  would  be  found  to  behave  in  general  more  like  other 
measures  of  old  learning  (Information,  Vocabulary,  Comprehension)  than 
like  measures  of  new  learning  and  higher  level  adaptive  functioning 
(Digit  Symbol,  Block  Design,  Form  Board).  This  is  because,  in  most 
neurological  conditions  frequent  among  elderly  patients,  concrete 
overlearned  behaviors  generally  prove  more  robust  and  tenacious  than 
behaviors  which  are  rarely  elicitted  or  only  selectively  appropriate, 
requiring  complex  cognitive  mediation. 

The  classification  hit  rates  of  the  ADL  scales  studied  here 
(73.3%  - 80.85%)  suggest  that  the  scales  may  be  valuable  additions  to 
the  battery  used  by  clinical  psychologists  frequently  asked  to  assist 
in  a differential  diagnosis  between  depression  and  organic  impairment 
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in  elderly  patients.  Of  course,  the  cutting  scores  used  in  this  study 
to  calculate  hit  rates  should  not  be  directly  applied  to  new  samples 
or  other  populations.  Widely  applicable  cutting  scores  must  be  based 
on  extensive  normative  data.  Independent  validation  is  also  required 
because  an  upward  bias  is  inherent  when  any  statistic  is  both  calcu- 
lated from  and  tested  on  the  same  sample.  Nevertheless,  studies 
assessing  the  incremental  validity  of  ADL  scales  in  such  contexts  as 
depression  versus  organicity  decisions  would  be  valuable. 

Significance  of  the  Study 

With  the  information  generated  by  this  study,  future  investi- 
gators researching  the  behavioral  outcome  of  geriatric  treatment 
programs  are  now  better  able  to  make  objective  and  defensible  choices 
of  dependent  variable  measures.  As  explained  above,  past  difficulties 
involved  in  making  such  choices  has  had  a noticeably  deleterious 
effect  on  the  quality  of  some  gerontological  research.  This  study  is 
a positive  forward  step  in  the  crucial,  yet  understudied,  areas  of 
scale  reliability  and  validity.  Future  gerontological  researchers  who 
heed  and  use  the  results  of  this  study  will  obtain,  in  their  own  work, 
findings  which  are  more  accurate  and  more  credible,  and  therefore  of 
much  more  use  to  the  administrative  planners  who  must  ultimately  apply 
those  findings.  The  clinician's  individual  assessment  armamentarium 
has  also  been  augmented  by  this  study.  Thus,  this  project  offers 
important  contributions  to  the  development  of  geriatric  care  delivery 
systems  on  the  clinical,  research,  and  administrative  levels. 


APPENDIX  A 


SUBJECT  INFORMED  CONSENT  FORM 


PARTICIPANT'S  CONSENT  FORM 


PROJECT  TITLE:  VALIDATION  STUDY  OF  SEVERAL  GERIATRIC  RATING  SCALES. 

PRINCIPAL  INVESTIGATORS:  PAUL  K.  CHAFETZ  & DR.  W.  RICE 

NAME  ADDRESS  

HOSPITAL  No.  (IF  PATIENT)  DATE TIME  WARD 

I agree  to  participate  in  the  research  described  below: 

This  is  a study  comparing  five  behavior  rating  scales 
often  used  to  measure  people's  activities  in  areas  such  as 
eating,  grooming,  socializing,  and  moving  about.  With 
information  from  participants  such  as  yourself,  we  hope  to 
choose  the  best  of  these  scales  for  further  use.  Although 
the  knowledge  gained  by  this  study  may  not  be  of  direct  or 
immediate  benefit  to  you,  it  will  be  very  useful  in  the 
evaluation  of  new  treatments  for  people  like  you. 

As  a participant,  you  will  be  rated  with  all  five 
scales.  The  nurses  will  complete  three  of  the  scales  with- 
out you  during  their  rounds.  That  is,  they  will  complete  a 
brief  questionnaire,  based  on  their  informal  observations  of 
your  behavior.  The  other  two  scales  will  be  completed  with 
you  personally  by  a project  investigator,  and  will  take  less 
than  one  hour  in  all.  You  will  be  asked  to  answer  some 
short  questions,  and  to  perform  a few  daily  behaviors,  such 
as  using  a spoon.  Participation  in  this  study  does  not 
involve  any  additional  medical  procedures,  so  the  only 
discomfort  you  might  experience  is  slight  fatigue  during 
testing.  Information  gained  through  these  scales  may  be 
shared  with  your  treatment  team  and  with  the  staff  of  this 
research  project,  but  is  otherwise  strictly  confidential. 

Your  participation  is  very  much  appreciated,  and  we 
will  gladly  answer,  as  well  as  we  can,  any  questions  you  may 
have. 

I have  read,  and  understand,  the  nature  and  purpose  of  this  research, 
as  described  above.  Furthermore,  it  is  agreed  that  the  information 
gained  from  this  investigation  may  be  used  for  educational  purposes 
which  may  include  publication.  I understand  that  I may  withdraw  my 
consent,  and  discontinue  my  participation,  at  any  time  without 
prejudice.  I also  understand  that,  in  the  event  of  my  sustaining  a 
physical  injury  which  is  proximately  caused  by  this  experiment,  no 
professional  medical  care  will  be  provided  me  without  charge. 

WITNESS SIGNED  

I,  the  undersigned,  have  defined  and  fully  explained  the  nature  and 
purpose  of  this  project  to  the  above  person  and/or  the  person  author- 
ized to  consent  for  him. 

SIGNED 
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APPENDIX  B 


SCALES 


Scale  1 

WARD  (RDRS)  Date 

Subject's  Rater's 

Name  Name 


Directions:  On  the  basis  of  your 

present  time,  will  you  please  rate 

EATING 

no  assistance 

moderate  assistance 

considerable  assistance 

DIET 

regular  diet 

modified  regular  diet 

special  diet 

REQUIRES  MEDICATION 

rarely 

occasionally 

every  day 

SPEECH 

not  impaired 

moderately  impaired 

unable  to  be  understood 

HEARING 

normal 

moderately  impaired 
deaf 

SIGHT 

normal  (glasses) 

moderately  impaired 
blind 

WALKING 

no  assistance 

crutches;  someone's  help 

unable  to  walk 

BATHING 

no  assistance 

moderate  assistance 

considerable  assistance 


knowledge  about  the  subject,  at  the 
the  following  items: 

DRESSING 

no  assistance 

moderate  assistance 

considerable  assistance 

INCONTINENCE 

never 

occassional ly 

all  the  time 

SHAVING 

no  assistance 

moderate  assistance 
considerable  assistance 

SAFETY  SUPERVISION 

never 
sometimes 
all  the  time 

CONFINED  TO  BED 
not  at  all 

part  of  the  day 

all  the  time 

MENTALLY  CONFUSED 
never 

occasional ly 
all  the  time 

UNCOOPERATIVE 

never 

occasional  ly 
all  the  time 

DEPRESSION 

never 

occasional ly 
all  the  time 
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Scale  2 

WARD  (PSMS)  DATE 

Subject's  Rater's 

Name  Name  

TOILET  **Check  one  answer  in  each  category. 

Cares  for  self  at  toilet  completely;  no  incontinence 

Needs  to  be  reminded,  or  needs  help  in  cleaning  self,  or  has  rare 

accidents  (weekly  at  most) 

Soiling  or  wetting  while  asleep,  more  than  once  a week 

Soiling  or  wetting  while  awake,  more  than  once  a week 

No  control  of  bowels  or  bladder 

FEEDING 

Eats  without  assistance 

Eats  with  minor  assistance  at  meal  times,  with  help  in  preparing 

food  or  with  help  in  cleaning  up  after  meals 

Feeds  self  with  moderate  assistance  and  is  untidy 

Requires  extensive  assistance  for  all  meals 

Does  not  feed  self  at  all  and  resists  efforts  of  others  to  feed 

him 

DRESSING 

Dresses,  undresses,  and  selects  clothes  from  own  wardrobe 

Dresses,  and  undresses  self,  with  minor  assistance 

Needs  moderate  assistance  in  dressing  or  selection  of  clothes 

Needs  major  assistance  in  dressing  but  cooperates  with  efforts  of 

others  to  help 

Completely  unable  to  dress  self  and  resists  efforts  of  others  to 

help 

GROOMING  (Neatness,  hair,  nails,  hands,  face,  clothing) 

Always  neatly  dressed  and  well-groomed,  without  assistance 

Grooms  self  adequately,  with  occasionally  minor  assistance,  e.g., 

in  shaving 

Needs  moderate  and  regular  assistance  or  supervision  in  grooming 

Needs  total  grooming  care,  but  can  remain  well  groomed  after  help 

from  other 

Actively  negates  all  efforts  of  others  to  maintain  grooming 

PHYSICAL  AMBULATION 
Goes  about  grounds  or  city 

Ambulates  within  residence  or  about  one  block  distance 

Ambulates  with  assistance  of  (check  one):  ( a)  another  person, 

( b)  railing,  ( c)  cane,  ( d)  walker,  or  ( e)  wheelchair: 

( 1)  gets  in  & out  without  help  ( 2)  needs  help  getting  in  & 

out 

Sits  unsupported  in  chair  or  wheelchair,  but  cannot  propel  self 

without  help 

Bedridden  more  than  half  the  time 


67 


BATHING 


(PSMS,  p.  2) 


Bathes  self  (tub,  shower,  sponge  bath)  without  help 

Bathes  self,  with  help  in  getting  in  and  out  of  tub 

Washes  face  and  hands  only,  but  cannot  bathe  rest  of  body 

Does  not  wash  self,  but  is  cooperative  with  those  who  bathe  him 

Does  not  try  to  wash  self,  and  resists  efforts  to  keep  him  clean 


68 


Scale  3 

WARD  (GRS)  DATE  

Subject's  Rater's 

Name  Name  

Directions:  Check  one  answer  under  each  question. 

WHEN  EATING,  THE  SUBJECT  REQUIRES: 

No  assistance  (feeds  himself) 

A little  assistance  (needs  encouragement) 

Considerable  assistance  (spoon  feeding,  etc.) 

THE  SUBJECT  IS  INCONTINENT: 

Never 

Sometimes  (once  or  twice  per  week) 

Often  (three  times  per  week  or  more) 

WHEN  BATHING  OR  DRESSING,  THE  SUBJECT  NEEDS: 

No  assistance 

Some  assistance 

Maximum  assistance 

THE  SUBJECT  WILL  FALL  FROM  HIS  BED  OR  CHAIR  UNLESS  PROTECTED  BY  STDF 
RAILS: 

Never  Sometimes  Often 

WITH  REGARD  TO  WALKING,  THE  SUBJECT: 

Has  no  difficulty 

Needs  assistance  in  walking 

Does  not  walk 

THE  SUBJECT'S  VISION,  WITH  OR  WITHOUT  GLASSES,  IS: 

Apparently  normal 

Somewhat  impaired 

Extemely  poor 

THE  SUBJECT'S  HEARING  IS: 

Apparently  normal 

Somewhat  impaired 

Extremely  poor 

WITH  REGARD  JO  SLEEP,  THE  SUBJECT: 

Sleeps  most  of  the  night 

Is  sometimes  awake 

Is  often  awake 

DURING  THE  DAY,  THE  SUBJECT  SLEEPS: 

_ Sometimes  _ Often  _ Most  of  day 
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(GRS,  p.  2) 

WITH  REGARD  TO  RESTLESS  BEHAVIOR  AT  NIGHT,  THE  SUBJECT  IS: 

Seldom  restless 
Sometimes  restless 
Often  restless 


THE  SUBJECT'S  BEHAVIOR  IS  WORSE  AT  NIGHT  THAN  IN  THE  DAYTIME: 

Never  Sometimes  Often 

WHEN  NOT  HELPED  BY  OTHER  PEOPLE,  THE  SUBJECT'S  APPEARANCE  IS: 

Almost  never  sloppy 
Sometimes  s loopy 
Almost  always  sloppy 

THE  SUBJECT  MASTURBATES  OR  EXPOSES  HIMSELF  PUBLICLY: 

Never  Sometimes  Often 

THE  SUBJECT  IS  CONFUSED  (UNABLE  TO  FIND  HIS  WAY  AROUND  WARD  OR  HOUSE. 
LOSES  THINGS): 


Almost  never 

THE  SUBJECT  KNOWS  THE  NAMES  OF: 


Sometimes 


Often 


More  than  one  member  of  the  staff 

Only  one  member  of  the  staff 

None  of  the  staff 


JHEJUBJECT  COMMUNICATES  IN  ANY  MANNER  (BY  SPEAKING,  WRITING  OR 

Almost  never 


GESTURING)  WELL  ENOUGH  TO~MAKE~ HIMSELF  EASILY  UNDERSTOOD? 

Almost  always  Sometimes 

THE  SUBJECT  REACTS  TO  HIS  OWN  NAME: 

Almost  always  Sometimes 

THE  SUBJECT  PLAYS  GAMES,  HAS  HOBBIES,  ETC. : 

Often  Sometimes 

THE  SUBJECT  READS  BOOKS  OR  MAGAZINES: 

Often  Sometimes 

THE  SUBJECT  WILL  BEGIN  CONVERSATIONS  WITH  OTHERS: 

Often  Sometimes 

THE  SUBJECT  IS  WILLING  TO  DO  THINGS  ASKED  OF  HIM: 

Often  Sometimes 

THE  SUBJECT  HELPS  WITH  CHORES  ON  THE  WARD: 


Often 


Sometimes 


Almost  never 


Almost  never 


Almost  never 


Almost  never 


Almost  never 


Almost  never 


WITHOUT  BEING  ASKED,  THE  SUBJECT  PHYSICALLY  HELPS  OTHER  PATIENTS: 


Often 


Sometimes 


Almost  never 
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(GRS,  p.  3) 

WITH  REGARD  TO  OTHER  PEOPLE  ON  THE  WARD,  THE  SUBJECT: 

Has  several  friends 

Has  just  one  friend 

Has  no  friends 

THE  SUBJECT  TALKS  WITH  OTHER  PEOPLE  ON  THE  WARD: 

Often  Sometimes  Almost  never 

THE  SUBJECT  HAS  A REGULAR  WORK  ASSIGNMENT: 

Away  from  the  ward 

On  the  ward 

No  regular  assignment 

the  SUBJECT  IS  DESTRUCTIVE  OF  MATERIALS  AROUND  HIM  (BREAKS  FURNITURE. 
TEARS  UP  MAGAZINES,  ETC.): 

Never  Sometimes  Often 

THE  SUBJECT  DISTURBS  OTHER  PATIENTS  OR  STAFF  BY  SHOUTING  OR  YELLING: 

Never  Sometimes 

THE  SUBJECT  STEALS  FROM  OTHER  PATIENTS  OR  STAFF  MEMBERS: 

Never  Sometimes 

THE  SUBJECT  VERBALLY  THREATENS  TO  HARM  OTHER  PATIENTS  OR  STAFF: 

Never  Sometimes  Often 

THE  SUBJECT  PHYSICALLY  TRIES  TO  HARM  OTHER  PATIENTS  OR  STAFF: 

Never  Sometimes  Often 


Often 

Often 
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Scale  4 

WARD  (MSBS)  DATE  

Subject's  Rater's 

Name  Name  

A.  E goes  to  S and  introduces  himself:  "I  am  , Mr. 

• I'm  glad  to  meet  you,"  extending  hand 

1)  Score  + if  any  discernible  response  to  greeting  1) 

2)  Score  + if  response  if  verbal  and  appropriate  2) 

3)  Score  + if  S offers  hand  to  E 3) 

B.  E says  either  (a)  "Won't  you  have  a seat?"  or 

(b)  "May  I sit  with  you  for  a while?" 
depending  on  whether  S is  brought  to  E,  or  E comes  to  S. 

4)  Score  + if  (a)  S sits  without  urging  or 

(b)  S assents  or  acknowledges  E's 


comment  4) 

C.  E says,  "How  are  you  today?" 

5)  Score  + if  any  discernible  response  to  question  5) 

6)  Score  + if  response  is  vergal  and  appropriate  6) 

D.  E drops  pencil  by  pushing  it  off  desk,  ostensibly  by  accident. 

If  S does  not  pick  up  pencil  spontaneously,  E says,  "Would  you 
pick  up  the  pencil  for  me?" 

7)  Score  + if  S picks  up  pencil  at  all  7) 

8)  Score  + if  S picks  up  pencil  spontaneously  8)  ~ 

E.  E says  "I  have  something  I want  to  show  you."  E hold  in  front  of 
S Figure  A of  the  Bender  Gestalt  Test. 

9)  Score  + if  S looks  at  Bender  card.  9) 


F.  "Here  is  a pencil."  E offers  it  to  S,  puts  paper  in  front  of  S, 
and  says,  "I  would  like  you  to  copy  this  drawing  on  this  paper." 


10)  Score  + if  S accepts  pencil  without  further  urging  10) 

11)  Score  + if  S makes  any  mark  on  paper  11) 

12)  Score  + if  S draws  an  appropriate  circle 

and  4-sided  figure  12) 

G.  E says,  "How  are  you  getting  along?" 

13)  Score  + if  any  discernible  response  to  the  question  13) 

14)  Score  + if  response  if  verbal  and  appropriate  14) 

H.  E crumples  a scap  of  paper  and  tosses  it  at  a wastebasket 
previously  placed  next  to  S,  purposely  missing. 

15)  Score  + S spontaneously  picks  up  paper  and  deposits 

it  in  wastebasket.  15) 
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(MSBS,  p.  2) 

I.  E says,  "I  have  a few  questions  I would  like  to  ask  you." 

a.  Where  are  we  now?  (Correct  name  of  place) 

b.  Where  is  this  place?  (Correct  city) 

c.  What  is  today's  date?  (Day  of  month) 

d.  What  month  is  it? 

e.  What  year  is  it?  MSQ) 

f.  How  old  are  you? 

g.  What  is  your  birthday?  (Month) 

h.  What  year  were  you  born? 

i.  Who  is  the  President  of  the  United  States? 

j.  Who  was  President  before  him? 

16)  Score  + if  S makes  any  verbal  response, 

irrespective  of  content,  to  all  questions  1-10.  16) 

J.  E places  a magazine  in  front  of  S and  busies  himself  with  writing 
on  pad  while  saying,  "I'll  be  busy  a minute." 

17)  Score  + if  S turns  at  least  one  page  of  magazine.  17) 

K.  E rises  and  extends  hand,  saying,  "Thank  you  very  much,  Mr. 


18)  Score  + if  S acknowledges  E's  departure  either 

verbally  or  with  gesture.  18) 

L.  The  remainder  of  the  items  are  based  on  E's  judgement  of 
the  behavior  of  the  patient  throughout  the  interview: 

19)  Score  + unless  inappropriate  grimaces  or  mannerisms 


are  readily  apparent  19) 

20)  Score  + if  the  patient  at  any  time  looks  E in  the  eye  20) 

21)  Score  + unless  S obviously  appears  to  avoid  E's  gaze 

at  all  times,  or  stares  at  E fixedly  21) 

22)  Score  + unless  S site  in  a bizarre  position  or  is 

in  constant  motion  or  is  nearly  motionless  22) 

23)  Score  + unless  S's  clothes  are  obviously 

disarranged,  unbuttoned,  or  misbuttoned  23) 

24)  Score  + unless  S is  drooling  or  nasal  mucus 
is  visible  or  food  deposits  are  conspicuous 

on  clothes  or  face  24) 

25)  Score  + unless  S attempts  to  move  away  from  E 

before  termination  of  interview  with  explanation  25) 
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Scale  5 

WARD  (PADL)  DATE  

Subject's  Rater's 

Name  Name  

SCORING:  0 = does  not  do  task  reasonably  well  on  own;  requires 

active  personal  assistance. 

1 = does  task  reasonably  well  on  own  or  with  some 

mechanical  aid 

9 = inapplicable  or  not  asked 

"I  am  going  to  ask  you  to  do  some  tasks  for  me.  Some  of  the  things  I 
ask  you  to  do  may  seem  silly  or  too  easy,  but  I ask  everyone  to  do  the 
same  things." 


Place  paper  cup  (empty)  in  front  of  patient:  "Show  me  how  you  drink 

from  this  cup." 


1) 

Grasps  cup  in  hand 

1)  _ 

2) 

Lifts  cup  upright 

2)  _ 

3) 

Touches  cup  to  mouth 

3)  _ 

4) 

Tips  cup  as  if  drinking 

4)  _ 

Place  tissue  box  in  front  of  patient:  " 

to  wipe  your  nose." 

Show  me 

how  you  use  a tissue 

5) 

Takes  tissue  out  of  box 

5)  _ 

6) 

Directs  tissue  to  nose 

6)  _ 

7) 

Makes  wiping  motions 

7)  _ 

Place  comb  on  table  in  front  of  patient: 
hair. " 

"Show 

me  how  you 

comb  your 

8) 

Picks  up  comb 

8)  _ 

9) 

Grasps  comb  properly 

9)  _ 

10) 

Brings  comb  to  hair 

10)  _ 

ID 

Makes  combing  motion 

11)  _ 

Place  emery  board  in  front  of  patient: 
nails;  Use  this." 

"Show 

me  how  you 

file  your 

12) 

Takes  emery  board  in  hand 

12)  _ 

13) 

Applies  emery  board  to  nail 

13)  _ 

14) 

Files  nail 

14) 
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(PADL,  p.  2) 


Place  shaver  in  front  of  patient:  "Show  me  how  you  use  this," 


15) 

Takes  shaver  in  hand 

15)  _ 

16) 

Brings  to  appropriate  place  on  face 

16)  _ 

17) 

Makes  shaving  motions 

17)  _ 

Place  rubber  ball  on  spoon  and  place  spoon  on  flat 
patient:  "Show  me  how  you  lift  this  spoon  with  the 
mouth. " 

surface  in  front  of 
bal 1 on  it  to  your 

18) 

Grasps  spoon  by  handle 

18)  _ 

19) 

Keeps  spoon  horizontal 

19)  _ 

20) 

Keeps  ball  balanced  on  spoon 

20)  _ 

21) 

Aims  at  mouth 

21)  _ 

22) 

Touches  ball  to  mouth 

22)  _ 

Place  faucet  on  block  in  front  of  patient:  "Show 

faucet;  Turn  it  on  for  water  and  off  again." 

me  how  you 

use  this 

23) 

Puts  hand  on  faucet 

23)  _ 

24) 

Turns  faucet  in  one  direction  (on) 

24)  _ 

25) 

Turns  faucet  in  opposite  direction  (off) 

25)  _ 

Place  light  switch  in  front  of  patient:  "Show  me 

switch  on  for  light,  and  off  again." 

how  you  turn  this 

26) 

Puts  finger  on  switch 

26)  _ 

27) 

Flicks  switch  in  one  direction 

27)  _ 

28) 

Flicks  back  to  original  position 

28)  _ 

Give  patient  jacket  with  long  sleeves:  "Put  this 

jacket  on 

for  me. 11 

29) 

Takes  hold  of  jacket 

29)  _ 

30) 

Slips  one  arm  in  jacket  arm 

30)  _ 

31) 

Pulls  jacket  over  shoulders  and  back 

31)  _ 

32) 

Slips  other  arm  into  sleeve 

32) 
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"Take  it  off  now. " 


33) 

Frees  one  arm 

33)  _ 

34) 

Frees  other  arm 

34)  _ 

35) 

Removes  jacket  from  body 

35)  _ 

Point  to  button  on  jacket.  Ask  irrespective  of  ratinqs 
"Button  that  for  me. " 

above: 

36) 

Places  hand  on  button 

36)  _ 

37) 

Slips  button  into  button  hole 

37)  _ 

"Now  unbutton  it." 

38) 

Places  hand  on  button 

38)  _ 

39) 

Releases  button  from  hole 

39)  _ 

Place  slipper  on  floor  in  front  of  patient:  "Put 

and  take  it  off." 

this  slipper  on: 

40) 

Moves  foot  to  slipper 

40)  _ 

41) 

Inserts  foot  into  slipper 

41)  _ 

42) 

Takes  foot  out  of  slipper 

42)  _ 

"Do  you  have  real  teeth  or  dentures?  Tell  me  how 
your  teeth."  Place  toothbrush  in  front  of  patient: 
brush  your  teeth.  Use  this  toothbrush." 

you  take 
"Show  me 

care  of 
how  you 

43) 

Takes  brush  in  hand 

43)  _ 

44) 

Point  bristles  toward  teeth 

44)  _ 

45) 

Puts  brush  in  or  near  mouth 

45)  _ 

46) 

Makes  brushing  motions 

46)  _ 

If  patient  has  removable  false  teeth,  ask:  "Would  you  mind  taking  out 

your  false  teeth?  Put  them  back  in." 

47) 

Removes  teeth 

47)  _ 

48) 

Replaces  teeth 

48) 
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Place  phone  in  front  of  patient:  "Use  this  phone  to  call  information 

(Please  dial  'O'  for  Operator)." 


49) 

Picks  up  receiver 

49)  _ 

50) 

Puts  phone  to  ear 

50)  _ 

51) 

Puts  hand  on  dial 

51)  _ 

52) 

Dials  number 

52)  _ 

Place  paper  and  pencil  in  front  of  patient:  "Write  your  name 

paper. " 

on  this 

53) 

Takes  pencil  in  hand 

53)  _ 

54) 

Moves  pencil  to  paper 

54)  _ 

55) 

Makes  writings  on  paper 

55)  _ 

56) 

Writes  name  appropriately  or  legibly 

56)  _ 

Show  patient  doorknob  and  lock,  and  lay  key  down  on  table;  say:  "This 
door  is  locked.  Show  me  how  you  open  it.  Open  the  lock  first.  Use 
this  key.  Turn  the  knob  and  lock  it  again.  Take  the  key  out." 


If  patient  cannot  manage  lock,  rate  0 for  items  referring  to 
lock.  Ask  patient:  "Show  me  how  you'd  open  the  door  if  it 

weren't  locked,"  and  rate  items  for  knob. 


57) 

Takes  key  in  hand 

57) 

58) 

Touches  lock  with  key 

58) 

59) 

Inserts  key  in  lock 

59) 

60) 

Turns  key  in  lock  (to  open) 

60) 

61) 

Grasps  doorknob 

61) 

62) 

Turns  doorknob 

62) 

63) 

Turns  key  (to  relock) 

63) 

64) 

Takes  key  out  of  lock 

64) 

Show  patient  clock  face:  "What  time  does  this  clock  say  it  is?" 

65)  Reads  one  hand  correctly  65) 

66)  Reads  other  hand  correctly  66) 


APPENDIX  C 


PRACTICAL  PROBLEMS  OF  DOING  RESEARCH 


This  project  involved  no  complex  technology  or  instrumentation. 
Materials  included  mimeographed,  paper  and  pencil  scales,  and  a 
variety  of  props  that  were  either  readily  available  (e.g. , a 
magazine),  or  constructed  with  supplies  from  a hardware  store  (e.g., 
an  outdoor  spigot  mounted  on  wood).  Assembling  these  materials  posed 
no  serious  or  lasting  obstacles. 

The  substantial  difficulties  encountered,  rather,  were  in  some 
cases  institutional,  and  in  all  cases  people-based.  The  purpose  of 
the  following  discussion,  it  should  be  understood,  is  not  to  malign  or 
attribute  blame,  but  rather  to  allow  future  investigators  to  more 
thoughtfully  and  knowledgeably  plan  the  nature  and  course  of  their 
projects. 

One  striking  fact  emerged  early  in  the  project.  The  freedom  to 
attempt  to  implement  this  research  project  required  the  written 
approval  of  surprisingly  many  individuals:  not  only  the  principal 

investigator's  (PI)  doctoral  committee  and  department  chairman,  but 
two  human  subject  committees  (Shands  and  Veterans  Hospitals)  with 
separate  and  different  protocols,  and  the  chiefs  of  each  GVAH  service 
involved  (medicine,  neurology,  psychiatry,  and  nursing). 

Since  most  of  the  data  in  this  study  were  generated  by  raters, 
the  question  of  who  would  serve  as  raters  was  one  of  central 
importance.  Ideally,  of  course,  an  unlimited  pool  of  competent  and 
cooperative  raters  would  have  been  available.  Several  groups  were 
considered,  including  practically  every  care  delivery  discipline  at 
every  level  of  training  to  be  found  in  a teaching  hospital.  The 
possibility  of  appointing  our  own  full-time,  roving  rater  was  even 
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considered.  In  contrast  to  the  ideal,  however,  it  was  quickly  dis- 
covered  that  participation  in  such  a research  project  was  among  the 
lowest  of  priorities  for  many  of  the  groups  approached.  In  certain 
cases,  additionally,  unfavorable  supervisor-employee  relations  made  it 
impossible  for  supervisors  to  exert  influence  on  their  subordinates  to 
participate.  Subordinates,  when  approached  directly  by  the  PI, 
occassional ly  tried  to  enmesh  the  PI  in  these  conflicts  by  offering  to 
participate  if  the  PI  could  extract  certain  concessions  from  their 
supervisors.  This  was  clearly  not  an  appropriate  role  for  a graduate 
student/researcher. 

Obtaining  permission  to  request  individual  medical  students  and 
residents  on  the  involved  wards  to  participate  was  particularly  time- 
consuming.  Largely  at  the  insistence  of  the  chairman  of  the  GVAH 
human  subjects  committee,  the  PI  had  to  secure  written  consent  from 
the  directors  of  medical  education  for  the  three  GVAH  services.  These 
in  turn  insisted  on  seeing  the  same  documentation  from  the  dean  of  the 
University  of  Florida  medical  school,  who  in  turn  insisted  on  the  same 
from  the  chairmen  of  the  three  departments  in  the  medical  school.  All 
seven  signatures  were  indeed  obtained  by  the  PI,  and  permission  was 
thus  granted  to  request  the  individual  students'  and  residents' 
participation. 

Two  problems,  however,  remained.  First,  the  individuals  were 
still  free  to  refuse  participation  for  any  reason  they  might  have  had. 
Naturally,  this  allowed  any  naivete,  misinformation,  or  antipathy 
toward  research  or  psychology  on  their  part  to  emerge,  thus  presenting 
a formidable  public  relations  challenge  to  the  PI.  Second,  the  high 
turnover  (rotation)  rate  of  these  individuals  severely  limited  the 
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usefulness  of  cooperative  raters  to  the  study,  while  also  replacing 
the  uncooperative  ones. 

Veterans  Administration  regulations  prohibit  staff  from  accepting 
extra  remuneration  for  in-hospital  activities.  It  was  thus  not 
possible  to  offer  money  to  raters  for  participating,  in  either  a 
flat-fee  or  piece-wise  arrangement.  Ultimately,  the  only  strategy 
that  produced  real  rater  cooperation  was  the  continued  presence  of  the 
PI  on  the  wards,  which  led  to  the  establishment  of  some  working 
personal  relationships.  In  this  context,  it  was  possible  to 
alternately  stroke  and  push  each  rater  to  cooperate. 

In  a similar  vein  to  the  rater  issues,  the  choice  of  an  assistant 
was  also  a crucial  one.  His  duties  were  not  only  to  generate  simul- 
taneous ratings  with  the  two  subject-present  scales,  but  also  to  take 
responsibility  for  the  completion  of  data  collection  when  the  PI  left 
town  to  do  a one-year  clinical  internship.  Time  proved  the  choice  of 
an  individual  to  be  a poor  one,  in  that  the  assistant's  reliability, 
commitment  to  the  project,  motivation  to  excel,  etc.,  proved  inade- 
quate. His  continued  involvement,  once  his  performance  began  to 
deteriorate,  cost  the  PI  considerable  additional  time  and  money,  and 
still  the  assistant  did  not  fulfill  his  original  obligation  to 
complete  the  data  collection.  It  may  be  suggested  that  any  PI  who 
wants  things  done  correctly  should  either  do  them  himself,  or 
establish  a structure  that  allows  him  to  maintain  control  over  the 
reinforcers  for  small  units  of  appropriate  behavior  by  the  assistant. 
As  the  present  case  illustrates,  some  formalization  of  the  assistant's 
obligations  is  proper  and  desirable. 
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